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space model. In the second part of the paper, we investigate probabilistic and analytical properties of equidistantly 
sampled continuous-time state space models and apply our results from the discrete-time setting to derive the asymp- 
totic properties of the QML estimator of discretely recorded MCARMA processes. Under natural identifiability condi- 
tions, the estimators are again consistent and asymptotically normally distributed for any sampling frequency. We also 
demonstrate the practical applicability of our method through a simulation study and a data example from economet- 
rics. 
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1. Introduction 

Linear state space models have been used in time series analysis and stochastic modelling for many dec- 
ades because of their wide applicability and analytical tractability (see, e. g., Brockwell and Davis, 1991; 
Hamilton, 1994, for a detailed account). In discrete time they are defined by the equations 

X„ = FX„^i + Z„_i, Y„ = HX„ + W„, neZ, (1.1) 

where X = (Ar,,),,^^ is a latent state process, F, H are coefficient matrices and, Z - {Z„)„^2^, W - (W„)„gz 
are sequences of random variables, see Definition 2. 1 for a precise formulation of this model. In this paper 
we investigate the problem of estimating the coefficient matrices F, H as well as the second moments of 
Z and W from a sample of observed values of the output process Y — (F„)„g2, using a quasi maximum 
likelihood (QML) or generalized least squares approach. Given the importance of this problem in practice, 
it is surprising that a proper mathematical analysis of the QML estimation for the model (1.1) has only 
been performed in cases where the model is in the so-called innovations form 

X„ = FZ„_i + Ks„^x, Y„ = HX„ + s„, n e Z, (1.2) 

where the innovations e have constant conditional variance and satisfy some higher order moment condi- 
tions (Hannan and Deistler, 1988, Chapter 4). This includes state space models in which the noise sequences 
Z, W are Gaussian, because then the innovations, which are uncorrected by definition, form an i. i. d. se- 
quence. Restriction to these special cases excludes, however, the state space representations of aggregated 
linear processes, as well as of equidistantly observed continuous-time linear state space models. 

In the first part of the present paper we shall prove consistency (Theorem 2.4) and asymptotic normality 
(Theorem 2.5) of the QML estimator for the general linear state space model (1.1) under the assumptions 
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that the noise sequences Z,W are ergodic, and that the output process Y satisfies a strong-mixing condition 
in the sense of Rosenblatt (1956). This assumption is not very restrictive, and is, in particular, satisfied 
if the noise sequence Z is i. i. d. with an absolutely continuous component, and W is strongly mixing. 
Our results are a multivariate generalization of Francq and Zakoi'an (1998), who considered the QML 
estimation for univariate strongly mixing ARMA processes. The very recent paper Boubacar Mainassara 
and Francq (2011), which deals with the structural estimation of weak vector ARMA processes, instead 
makes a mixing assumption about the innovations sequence s of the process under consideration, which is 
very difficult to verify for state space models; their results can therefore not be used for the estimation of 
general discretely-observed linear continuous-time state space models. 

As alluded to above, one advantage of relaxing the assumption of i. i. d. innovations in a discrete-time 
state space model is the inclusion of sampled continuous-time state space models. These were introduced 
in the form of continuous-time ARMA (CARMA) models in Doob (1944) as stochastic processes satisfy- 
ing the formal analogue of the familiar autoregressive moving average equations of discrete-time ARMA 
processes, namely 

fl(D)y(f) = b{D)DW{t), D = d/df, (1.3) 

where a and b are suitable polynomials, and W denotes a Brownian motion. In the recent past, a consid- 
erable body of research has been devoted to these processes. One particularly important extension of the 
model (1.3) was introduced in Brockwell (2001), where the driving Brownian motion was replaced by 
a Levy process with finite logarithmic moments. This allowed for a wide range of possibly heavy-tailed 
marginal distribution of the process Y as well as the occurrence of jumps in the sample paths, both char- 
acteristic features of many observed time series, e.g. in finance (Cont, 2001). Recently, Marquardt and 
Stelzer (2007) further generalized Eq. (1.3) to the multivariate setting, which gave researchers the possib- 
iUty to model several dependent time series jointly by one linear continuous-time process. This extension 
is important, because many time series, exhibit strong dependencies and can therefore not be modelled ad- 
equately on an individual basis. In that paper, the multivariate non-Gaussian equivalent of Eq. (1.3), namely 
P{D)Y{t) = Q{D)DL{t), for matrix-valued polynomials P and Q and a Levy process L, was interpreted by 
spectral techniques as a continuous-time state space model of the form 

dG(f) = JlG{t)dt + SdL(f), Y(t) = CG(f); (1.4) 

see Eq. (3.4) for an expression of the matrices S and C. The structural similarity between Eq. (1.1) and 
Eq. (1.4) is apparent, and it is essential for many of our arguments. Taking a diff'erent route, multivariate 
CARMA processes can be defined as the continuous-time analogue of discrete-time vector ARMA models, 
described in detail in Hannan and Deistler (1988). As continuous-time processes, CARMA processes are 
suited particularly well to model irregularly spaced and high-frequency data, which makes them a flexible 
and efficient tool for building stochastic models of time series arising in the natural sciences, engineering 
and finance (e. g. Benth and Saltyte Benth, 2009; Todorov and Tauchen, 2006). In the univariate Gaussian 
setting, several different approaches to the estimation problem of CARMA processes have been investig- 
ated (see, e.g., Larsson, Mossberg and Soderstrom, 2006, and references therein). Maximum likelihood 
estimation based on a continuous record was considered in Brown and Hewitt (1975); Feigin (1976); Pham 
(1977). Due to the fact that processes are typically not observed continuously and the limitations of di- 
gital computer processing, inference based on discrete observations has become more important in recent 
years; these approaches include variants of the Yule-Walker algorithm for time-continuous autoregress- 
ive processes (Hyndman, 1993), maximum likelihood methods (Brockwell, Davis and Yang, 2011), and 
randomized sampling (Rivoira, Moudden and Fleury, 2002) to overcome the aliasing problem. Alternative 
methods include discretization of the differential operator (Soderstrom et al., 1997), and spectral estimation 
(Gillberg and Ljung, 2009; Lii and Masry, 1995). For the special case of Ornstein-Uhlenbeck processes, 
least squares and moment estimators have also been investigated without the assumptions of Gaussianity 
(Hu and Long, 2009; Spiliopoulos, 2009). 

In the second part of this paper we consider the estimation of general multivariate CARMA (MCARMA) 
processes with finite second moments based on equally spaced discrete observations exploiting the results 
about the QML estimation of general linear discrete-time state space models. Under natural identifiability 
assumptions we obtain in the main Theorem 3. 16 strongly consistent and asymptotically normal estimators 
for the coefficient matrices of a second-order MCARMA process and the covariance matrix of the driving 
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Levy process, which determine the second-order structure of the process. It is a natural restriction of the 
QML method that distributional properties of the driving Levy process which are not determined by its 
covariance matrix cannot be estimated. However, once the autoregressive and moving average coefficients 
of a CARMA process are (approximately) known, and if high-frequency observations are available, a para- 
metric model for the driving Levy process can be estimated by the methods described in BrockweU and 
Schlemm (2012). Thus it should be noted that the paper Brockwell and Schlemm (2012) considers the 
same model, but whereas the present paper considers the estimation of the autoregressive and moving aver- 
age parameters from equidistant observations letting the number of observations go to infinity, Brockwell 
and Schlemm (2012) assume that the autoregressive and moving average parameters are known and show 
how to estimate the driving Levy process and its parameters when both the observation frequency and the 
time horizon go to infinity. A further related paper is Schlemm and Stelzer (2012) whose result on the 
equivalence of MCARMA processes and state space models provides the foundations for the estimation 
procedure considered here. That paper also aimed at using the results of Boubacar Mainassara and Francq 
(201 1) directly to estimate the autoregressive and moving average parameters of an MCARMA process and 
therefore provided conditions for the noise of the induced discrete time state space model to be strongly 
mixing. However, when we investigated this route further it turned out that the approach we take in the 
present paper is more general and far more convenient, since any stationary discretely sampled MCARMA 
process with finite second moments is strongly mixing, whereas assumptions ensuring a non-trivial abso- 
lutely continuous component of the noise are needed to be able to use the results of Boubacar Mainassara 
and Francq (201 1). Hence, the approach taken in the present paper appears rather natural for MCARMA 
processes. Finally, we note that the estimation of the spectral density of univariate CARMA processes and 
the estimation in the case of an infinite variance has recently been considered in Fasen and Fuchs (2012a,b), 
and that Fasen (2012) looks at the behaviour of the sample autoco variance function of discretely observed 
MCARMA processes in a high frequency limit. 

Outline of the paper The organization of the paper is as follows. In Section 2 we develop a QML 
estimation theory for general non-Gaussian discrete-time linear stochastic state space models with finite 
second moments. In Section 2. 1 we precisely define the class of linear stochastic state space models as well 
as the QML estimator. The main results, that under a set of technical conditions this estimator is strongly 
consistent and asymptotically normally distributed as the number of observations tends to infinity, are given 
as Theorems 2.4 and 2.5 in Section 2.2. The following two Sections 2.3 and 2.4 present the proofs. 

In Section 3 we use the results from Section 2 to establish asymptotic properties of a QML estimator 
for multivariate CARMA processes which are observed on a fixed equidistant time grid. As a first step, we 
review in Section 3.1 their definition as well as their relation to the class of continuous-time state space 
models. This is followed by an investigation of the probabilistic properties of a sampled MCARMA process 
in Section 3.3 and an analysis of the important issue of identifiability in Section 3.4. Finally, we are able to 
state and prove our main result. Theorem 3.16, about the strong consistency and asymptotic normality of 
the QML estimator for equidistantly sampled multivariate CARMA processes in Section 3.5. 

In the final Section 4, we present canonical parametrizations, and we demonstrate the applicability of 
the QML estimation for continuous-time state space models with a simulation study. 

Notation We use the following notation: The space of m x « matrices with entries in the ring K is denoted 
by M,„_„(K) or M,„(K) if m - n. The set of symmetric matrices is denoted by S,„(K), and the symbols 
S^(R) (S^^(R)) stand for the subsets of positive semidefinite (positive definite) matrices, respectively. 
denotes the transpose of the matrix A, imA its image, kerA its kernel, cr(A) its spectrum, and 1,„ 6 M„,(K) 
is the identity matrix. The vector space R'" is identified with M,„4(R) so that u - {u^,...,u"'Y 6 W 
is a column vector ||'|| represents the Euclidean norm, (■, ■) the Euclidean inner product, and 0,„ 6 R"' 
the zero vector K[X] (K{X)) denotes the ring of polynomial (rational) expressions in X over K, Ib{-) the 
indicator function of the set B, and 5„ „, the Kronecker symbol. The symbols E, Var, and Cov stand for the 
expectation, variance and covariance operators, respectively. Finally, we write dm for the partial derivative 
operator with respect to the mth coordinate and V = ( d\ ■■■ d,- ) for the gradient operator When there 
is no ambiguity, we use dmfi^o) and Vj/(i?o) as shorthands for d„,f{-&)\^=§^ and ^fff{'&)\^=g„, respectively. 
A generic constant, the value of which may change from line to line, is denoted by C. 
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2. Quasi maximum likelihood estimation for state space models 

In this section we investigate QML estimation for general linear state space models in discrete time, and 
prove consistency and asymptotic normality. On the one hand, due to the wide applicability of state space 
systems in stochastic modelling and control, these results are interesting and useful in their own right. In 
the present paper they will be applied in Section 3 to prove asymptotic properties of the QML estimator for 
discretely observed multivariate continuous-time ARMA processes. 

Our theory extends existing results from the literature, in particular concerning the QML estimation of 
Gaussian state space models, of state space models with independent innovations (Hannan, 1975), and of 
weak univariate ARMA processes which satisfy a strong mixing condition (Francq and Zakoian, 1998). 
The techniques used in this section are similar to Boubacar Mainassara and Francq (201 1). 



2.1. Preliminaries and definition of the QML estimator 

The general Unear stochastic state space model is defined as follows. 

Definition 2.1. An W^-valued discrete-time linear stochastic state space model [F, H, Z, W) of dimension 

N is characterized by a strictly stationary M.^'*''' -valued sequence with mean zero and finite 

covariance matrix 

E ( Zl Wl ) =5,„.„( s )' «''"^^' ^2.1) 

for some matrices Q € S^(]R), S € S^(R), and R € Mftr^(R); a state transition matrix F € Miv(IR.); and an 
observation matrix H € Mj_ftr(R). It consists of a state equation 

X„ = FZ„_i + Z„_i, n e Z, (2.2a) 

and an observation equation 

Y„ = HX„ +Wn, n 6 Z. (2.2b) 

The M!^ -valued autoregressive process X — (A'„)„(=z is called the state vector process, and Y — {Yn)„(ix is 
called the output process. 

The assumption that the processes Z and W are centred is not essential for our results, but simplifies 
the notation considerably. Basic properties of the output process Y are described in Brockwell and Davis 
(1991, §12.1); in particular, if the eigenvalues of F are less than unity in absolute value, then Y has the 
moving average representation 

oo 

Y„ = W„ + hJ^ F^-^Zn^y, neZ. (2.3) 

v=l 

Before we turn our attention to the estimation problem for this class of state space models, we review 
the necessary aspects of the theory of Kalman filtering, see Kalman ( 1960) for the original control-theoretic 
account and Brockwell and Davis (1991, §12.2) for a treatment in the context of time series analysis. The 
linear innovations of the output process Y are of particular importance for the QML estimation of state 
space models. 

Definition 2.2. Let Y — (F„)„ez be an -valued stationary stochastic process with finite second moments. 
The linear innovations e — (e„)„ez of Y are then defined by 

e„ — Y„ — P„-iY„, P„ — orthogonal projection onto spanjFv ■ < v < n) , (2.4) 



where the closure is taken in the Hilbert space of square-integrable random variables with inner product 
{X, Y) i-> E<X, Y). 
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This definition immediately implies that the innovations e of a stationary stochastic process Y are sta- 
tionary and uncorrelated. The following proposition is a combination of Brockwell and Davis (1991, Pro- 
position 12.2.3) and Hamilton (1994, Proposition 13.2). 

Proposition 2.1. Assume that Y is the output process of the state space model (2.2), that at least one of 
the matrices Q and S is positive definite, and that the absolute values of the eigenvalues of F are less than 
unity. Then the following hold. 

i) The discrete-time algebraic Riccati equation 

Q = FD.F^ + Q- [fQH'^ + r] [//Q//^ + 5] ' [fQH^ + Rf (2.5) 

has a unique positive semidefinite solution Q G S^(K-)- 

ii) The absolute values of the eigenvalues of the matrix F — KH € Mn{W) are less than one, where 

K = [fQH^ + r] [h^H'^ + e MN,dW (2-6) 

is the steady-state Kalman gain matrix. 
Hi) The linear innovations eofY are the unique stationary solution to 

X„^(F- KH) Z„_i -H KY„^u e„ = Y„ - HX„, neZ. (2.7a) 

Using the backshift operator B, which is defined by B F„ = F„_i, this can be written equivalently as 

oo 

e„ ^{id-H [1^ - (F - KH) B]-i KB] Y„ ^Y„-hY^{F- KHy-'KY„^,. (2.7b) 

v=l 

The covariance matrix V — Ee„sj^ e S^(R) of the innovations e is given by 

V = Ee„eJ = H^H^ + S . (2.8) 
iv) The process Y has the innovations representation 

X„ = FX„^i H- Ke„^u Y„ = HX„ + e„, n € Z, (2.9a) 
which, similar to Eqs. (2.7), allows for the moving average representation 

oo 

Y„^[\d-H{lN-FBr' KB]Yn^e„+HY^F'-^Ken-,, n € Z. (2.9b) 

v=l 

For some parameter space c W, r e N, the mappings 

; Ma,(R), H(.):&^ Md,N, (2.10a) 

together with a collection of strictly stationary stochastic processes Z^, Wg, e 0, with finite second 
moments determine a parametric family (F^, H^, Z^, W^)^^q of linear state space models according to 
Definition 2.1. For the variance and covariance matrices of the noise sequences Z, W we use the notation 
(cf. Eq. (2.1)) = EZ,j„Z^^^, = EW,>.„W^jj, and R^ = EZ^j,W^^^, which defines the functions 

Q^y.&^%(R), 5(.):0^S+, % : ^ M;v,rf(R). (2.10b) 

It is well known (Brockwell and Davis, 1991, Eq. (1 1.5.4)) that for this model, minus twice the logarithm 
of the Gaussian likelihood of based on a sample - (Yi, . . . , Yl) of observations can be written as 

L L 

^(i?,/) = Yu '^.« = Z ^log^TT + logdety^ + 4„V^'e*,„] , (2.11) 
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where e,> „ and Vg are given by analogues of Eqs. (2.7a) and (2.8), namely 

e»„^[\d-H»[lN-{F»-KMW' K^W^Yn, « 6 Z, V» ^ H»ai,Hl + S (2.12) 

and K^, are defined in the same way as K, Q in Eqs. (2.5) and (2.6). In the following we always assume 
thatj^ = {Y^^^i, . . ., Yg^x) is a sample from the output process of the state space model {Fg^,H^g, Z^^, W^^) 
corresponding to the parameter value i?o- We therefore call i^o the true parameter value. It is important to 
note that are the true innovations of Y§^, and that therefore 'Ee^^ „e'^^ - Vg^, but that this relation fails 
to hold for other values of i?. This is due to the fact that eg is not the true innovations sequence of the state 
space model corresponding to the parameter value t^. We therefore call the sequence e§ pseudo- innovations. 

The goal of this section is to investigate how the value i?o can be estimated from by maximizing 
Eq. (2. 11). The first difficulty one is confronted with is that the pseudo-innovations are defined in terms 
of the full history of the process Y = Y^^, which is not observed. It is therefore necessary to use an approx- 
imation to these innovations which can be computed from the finite sample y^. One such approximation is 
obtained if, instead of using the steady-state Kalman filter described in Proposition 2. 1, one initializes the 
filter at « = 1 with some prescribed values. More precisely, we define the approximate pseudo-innovations 
Eg via the recursion 

Xi,,„^(Fg-KgHi,)Xff,„^i+K/,Y„^u eg,n^Y„-H»Xg^n, « 6 N, (2.13) 

and the prescription X^ i - AT^ initiai- The initial values Z^ initiai are usually either sampled from the sta- 
tionary distribution of X§, if that is possible, or set to some deterministic value. Alternatively, one can 
additionally define a positive semidefinite matrix Q,?jnitiai and compute Kalman gain matrices K§„ recurs- 
ively via Brockwell and Davis (1991, Eq. (12.2.6)). While this procedure might be advantageous for small 
sample sizes, the computational burden is significantly smaller when the steady-state Kalman gain is used. 
The asymptotic properties which we are dealing with in this paper are expected to be the same for both 
choices because the Kalman gain matrices K^ „ converge to their steady state values as n tends to infinity 
(Hamilton, 1994, Proposition 13.2). 

The QML estimator ^ for the parameter § based on the sample y^ is defined as 

^'•^argmin^.e^i?,/), (2.14) 

where ^{d;y^) is obtained from ^{d-,y^) by substituting e^,„ from Eq. (2. 13) for ea^n, i- e- 

L L 

J^{-»,y'') = Yj = Yu + det Vit + e^,, V^'e^,,,] . (2.15) 

n= 1 «= 1 

2.2. Technical assumptions and main results 

Our main results about the QML estimation for discrete-time state space models are Theorem 2.4, stating 

that the estimator ^ given by Eq. (2.14) is strongly consistent, which means that ^ converges to i?o almost 

surely, and Theorem 2.5, which asserts the asymptotic normality of ^ with the usual L'^- scaling. In order 
to prove these results, we need to impose the following conditions. 

Assumption Dl. The parameter space is a compact subset ofW. 

Assumption D2. The mappings F(.), //(.), S (.), and R(.) in Eqs. (2.10) are continuous. 

The next condition guarantees that the models under consideration describe stationary processes. 

Assumption D3. For every § e @, the following hold: 

i) the eigenvalues ofF^ have absolute values less than unity, 

ii) at least one of the two matrices and is positive definite. 
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Hi) the matrix is non-singular 

The next lemma shows that the assertions of Assumption D3 hold in fact uniformly in §. 

Lemma 2.2. Suppose that Assumptions Dl to D3 are satisfied. Then the fallowing hold. 

i) There exists a positive number p < 1 such that, for all € 0, it holds that 

max{|i| : A e o-{Fi)] < p. (2.16a) 

ii) There exists a positive number p < 1 such that, for all e 0, it holds that 

m&x{\A\:Aeo-(Fi,-Ki,Ha)}<p, (2.16b) 

where is defined by Eqs. (2.5) and (2.6). 
Hi) There exists a positive number C such that || ' || < C for all §. 

Proof. Assertion i) is a direct consequence of Assumption D3, i), the assumed smoothness of Fq 
(Assumption D2), the compactness of (Assumption D 1 ), and the fact (Bernstein, 2005, Fact 10. 1 1 .2) that 
the eigenvalues of a matrix are continuous functions of its entries. Claim ii) follows with the same argument 
from Proposition 2.1, ii) and the fact that the solution of a discrete-time algebraic Riccati equation is a 
continuous function of the coefficient matrices (Sun, 1998). Moreover, by Eq. (2.8), the function 
is continuous, which shows that Assumption D3, iii) holds uniformly in § as well, and so iii) is proved. □ 

For the following assumption about the noise sequences Z and W we use the usual notion of ergodicity 
(see, e. g., Durrett, 2010, Chapter 6). 

Assumption D4. The process ( Z^^ ) is ergodic. 

The assumption that the processes Z^^ and are ergodic implies via the moving average representa- 
tion (2.3) and Krengel (1985, Theorem 4.3) that the output process Y - Y^^ is ergodic. As a consequence, 
the pseudo-innovations e§ defined in Eq. (2.12) are ergodic for every § e &. 

Our first identifiability assumption precludes redundancies in the parametrization of the state space 
models under consideration and is therefore necessary for the true parameter value #o to be estimated 
consistently. It will be used in Lemma 2.10 to show that the quasi likelihood function given by Eq. (2.15) 
asymptotically has a unique global minimum at i^o- 

Assumption D5. For all d^o & e @, there exists a z e C such that 

H9[\N-{Fi,-KMzr'Ka*Ha^[\N-{Fa^-Ka^Ht,)z]-'K^,, or Va + V^,. (2.17) 

Assumption D5 can be rephrased in terms of the spectral densities fy^ of the output processes Y§ of 
the state space models {F^, Hg, Z^, W^). This characterization will be very useful when we apply the 
estimation theory developed in this section to state space models that arise from sampling a continuous- 
time ARMA process. 

Lemma 2.3. If, for all i^q e &, there exists an to e [-n,7r] such that fygioS) + /y^^(w), then 

Assumption D5 holds. 

Proof. We recall from Hamilton ( 1 994, Eq. ( 1 0.4.43)) that the spectral density fy^ of the output process Yg 
of the state space model (Fg, H^, Z^, W^) is given by fy^iui) - {2n)^^J% (e™) V^ J^^ (^"^) ' ^ ^ 
where J^(z) := - {F^ — K^H^)zT^ + z. If Assumption D5 does not hold, we have that both 

J^(z) = J^„(z) for all z € C, and V§ — Va^, and, consequently, that fyifiai) - fy^^ (w), for all to e [-tt, n}, 
contradicting the assumption of the lemma. □ 
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Under the assumptions described so far we obtain the following consistency result. 

Theorem 2.4 (Consistency of ^ ). Assume that (F^, H^, Z^, W^)ffizQ is a parametric family of state space 
models according to Definition 2.1, and let — (Y^^i, . . . , Y^^^i) be a sample of length Lfrom the output 

process of the model corresponding to &q. If Assumptions Dl to D5 hold, then the QML estimator ^ — 

argmin^gQ J^{i^,y^) is strongly consistent, i. e. i? — > i?o almost surely, as L oo. 

We now describe the conditions which we need to impose in addition to Assumptions Dl to D5 for the 
asymptotic normality of the QML estimator to hold. The first one excludes the case that the true parameter 
value i?o lies on the boundary of the domain 0. 

Assumption D6. The true parameter value §o is an element of the interior of®. 

Next we need to impose a higher degree of smoothness than stated in Assumption D2 and a stronger 
moment condition than Assumption D4. 

Assumption D7. The mappings //(.), 2( )> and R(.) in Eqs. (2.10) are three times continuously 
dijferentiable. 

By the results of the sensitivity analysis of the discrete-time algebraic Riccati equation in Sun (1998), 
the same degree of smoothness, namely C^, also carries over to the mapping § Vq- 

Assumption D8. The process ^ Z^^ ^ has finite (4 + 6)th moments for some 5 > Q. 

Assumption D8 implies that the process Y has finite (4 + 5)th moments. In the definition of the general 
linear stochastic state space model and in Assumption D4, it was only assumed that the sequences Z 
and W are stationary and ergodic. This structure alone does not entail a sufficient amount of asymptotic 
independence for results like Theorem 2.5 to be established. We assume that the process Y is strongly 
mixing in the sense of Rosenblatt (1956), and we impose a summability condition on the strong mixing 
coefficients, which is known to be suflicient for a Central Limit Theorem for Y to hold (Bradley, 2007; 
Ibragimov, 1962). 

Assumption D9. Denote by ay the strong mixing coefficients of the process Y — Y^^. There exists a 
constant 6 > Q such that Yj^j=a [ayim)]'^ < oo. 

In the case of exponential strong mixing. Assumption D9 is always satisfied, and it is no restriction to 
assume that the 6 appearing in Assumptions D8 and D9 are the same. It has been shown in Mokkadem 
(1988); Schlemm and Stelzer (2012) that, because of the autoregressive structure of the state equation 
(2.2a), exponential strong mixing of the output process Y^^ can be assured by imposing the condition 
that the process is an i. i. d. sequence whose marginal distributions possess a non-trivial absolutely 
continuous component in the sense of Lebesgue's decomposition theorem. 

Finally, we require another identifiability assumption, that will be used to ensure that the Fisher inform- 
ation matrix of the QML estimator is non-singular. This is necessary because the asymptotic covariance 

matrix in the asymptotic normality result for ■& is directly related to the inverse of that matrix. Assump- 
tion DIO is formulated in terms of the first derivative of the parametrization of the model, which makes 
it relatively easy to check in practice; the Fisher information matrix, in contrast, is related to the second 
derivative of the logarithmic Gaussian likelihood. For 7 € N and § e®, the vector 41^ j € R*^^-'''" is defined 
as 

/ r J 

[\j^y®Kl®Hi,] (vecl/v)^ (vecFtf)^ ■■■ (vecF^) 

vec 



(2.18) 



where ® denotes the Kronecker product of two matrices, and vec is the linear operator that transforms a 
matrix into a vector by stacking its columns on top of each other 
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Assumption DIO. There exists an integer jo 6 N such that the [{jo + 2)d~] X r matrix ^&4'9o,h '^"^^ rank r. 

Our main result about the asymptotic distribution of the QML estimator for discrete-time state space 
models is the following theorem. Equation (2.20) shows in particular that this asymptotic distribution is 
independent of the choice of the initial values X^ initiai- 

Theorem 2.5 (Asymptotic normality of ^ ). Assume that (F^, H^, Z^, Ws)ff^@ is a parametric family of 
state space models according to Definition 2.1, and let — (Ftf^,i, . . . , Y^^^i) be a sample of length Lfrom 
the output process of the model corresponding to d'o- If Assumptions Dl to DIO hold, then the maximum 

likelihood estimator & — argmin^^g ^(i?, y^) is asymptotically normally distributed with covariance mat- 
rix E = J^^IJ^^, i. e. 

VI - &o] ^(0, H), (2.19) 

where 

/= limL-'Var(VtfJf(i?o,/)), 7 = lim L"' Vf„^ (i?o,/) ■ (2.20) 

Note that / and /, which give the asymptotic covariance matrix E of the estimators, are deterministic 
and only depend on the true parameter value d'o. The matrix J actually is the Fisher information and an 
alternative expression for J can be found in Lemma 2.17. Despite being deterministic, the asymptotic 
variance S is not immediate to obtain and needs to be estimated, as usually in connection with QML 
estimators. This is a non-trivial task and a detailed analysis of this is beyond the scope of the present paper, 
but worthy of consideration in more detail in future work. However, it should be noted that when is a 

consistent estimator for H, then Theorem 2.5 implies that VZ(S^)"'''^ ^ - i?oj > ^(0, 1,). Observe 

that no stable convergence in law (in the sense originally introduced by Renyi (1963)) is needed to obtain 
the latter result for our QML estimator, as this stronger convergence concept is needed only when the 
limiting variance in a "mixed normal limit theorem" is random. 

In practice, estimating the asymptotic covariance matrix S is important in order to construct confid- 
ence regions for the estimated parameters or in performing statistical tests. The problem of estimating it 
has also been considered in the framework of estimating weak VARMA processes in Boubacar Mainas- 
sara and Francq (2011) where the following procedure has been suggested, which is also appUcable in 

our set-up. First, J(-&o) is estimated consistently by - L 'V^^^^^ ,y'^- For the computation of 

we rely on the fact that the Kalman filter cannot only be used to evaluate the Gaussian log-likelihood of 
a state space model but also its gradient and Hessian. The most straightforward way of achieving this is 
by direct differentiation of the Kalman filter equations, which results in increasing the number of passes 
through the filter to r -H 1 and r{r + 3)/2 for the gradient and the Hessian, respectively. The construction 
of a consistent estimator of / = /(i?o) is based on the observation that / = Sagz 'Cov(^,j„,„, ^#o,„+a), where 
^i?o.n - [log det y^j^ «^tf„'®*o.«]- Assuming that (f#„,„)„EN+ admits an infinite-order AR represent- 
ation <^(B){^^ „ - Un, where 0(z) = 1,- + '^iz' and iU„)nen+ is a weak white noise with covariance 
matrix it follows from the interpretation of //(27r) as the value of the spectral density of (^i>„,„)neN+ at 
frequency zero that / can also be written as / = (5 '(l)St/<l)(l) The idea is to fit a long autoregression to 
{i^L ^p„=i,...L, the empirical counterparts of (^,>„.„)„eN+ which are defined by replacing §o with the estimate 

& in the definition of /"^o,,,. This is done by choosing an integer s > 0, and performing a least-squares 
regression of {~l on f-i , i + 1 < « < L. Denoting by 6^(z) = 1,. + 4)^ z' the ob- 

tained empirical autoregressive polynomial and by £j the empirical covariance matrix of the residuals of 
the regression, it was claimed in Boubacar Mainassara and Francq (201 1, Theorem 4) that under the addi- 
tional assumption E [||e„||**^*j < oo the spectral estimator - (<I)j(l)) £j (6^(1)) converges to / in 
probability as L, i — > oo if i^'/L — > 0. The covariance matrix of # is then estimated consistently as 



(2.21) 
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In the simulation study performed in Section 4.2, we estimate the covariance matrix E of the estimators 
in the way just describe. From a comparison with the standard deviations of the estimators obtained from 
the simulations it can be seen that the approach performs convincingly. 

A possible alternative approach to estimate the asymptotic covariance matrix S may also be the use of 
bootstrap techniques. However, it seems that to this end the existing bootstrapping techniques need to be 
extended considerably (cf. Brock well, KreiB and Niebuhr (2012)). 

2.3. Proof of Theorem 2.4 - Strong consistency 

In this section we prove the strong consistency of the QML estimator ^ . 

The standard idea why the QML (or sometimes also Gaussian maximum likelihood) estimators work in 
a linear time series/state space model setting is that the QML approach basically is very close to estimating 
the parameters using the spectral density which is in turn in a one-to-one relation with the second moment 
structure (see e.g. Brockwell and Davis (1991, Chapter 10)). The reason is, of course, that a Gaussian 
process is completely characterized by the mean and autocovariance function. So as soon as one knows 
that the parameters to be estimated are identifiable from the autocovariance function (and the mean) and 
the process is known to be ergodic, the QML estimators should be strongly consistent. Despite this simple 
standard idea, the upcoming actual proof of the strong consistency is lengthy as well as technical and 
consists of the following steps: 

1 . When we use the the Kalman filter with fixed parameters & on the finite sample y^, the obtained 
pseudo-innovations e§ approximate the true pseudo-innovations (obtainable from the steady state 
Kalman filter in theory) well; see Lemma 2.6. 

2. The quasi UkeUhood (QL) function ^ obtained from the finite sample (via s#) converges for the 
sample size L ^ oo uniformly in the parameter space to the true QL function ^ (obtained from the 
pseudo-innovations e,>); see Lemma 2.7. 

3. As the number L of observation grows, the QL function ^ divided by L converges to the expected 
QL function ^ uniformly in the parameter space; see Lemma 2.8. 

4. The expected QL function ^ has a unique minimum at the true parameter §0; see Lemmas 2.9 
and 2.10. _ 

5. The QL function ^ divided by the number of observations evaluated at its minimum in the parameter 
space (i.e., at the QML estimator) converges almost surely to the expected QL function ^ evaluated 
at the true parameter i?o {its minimum). 

6. Finally, one can show that also the argumentof the minimum of the QL function ^ (i.e. the QML 
estimators) converges for L ^ 00 to i^o, which proves the strong consistency. 

As a first step we show that the stationary pseudo-innovations processes defined by the steady-state 
Kalman filter are uniformly approximated by their counterparts based on the finite sample y^. 

Lemma 2.6. Under Assumptions Dl to D3, the pseudo-innovations sequences s§ and defined by the 
Kalman filter equations (2.7a) and (2. 13) have the following properties. 

i) If the initial values X^i„i,iai are such that sup^^g ||^i?,/;i/»£;/|| is almost surely finite, then, with probability 
one, there exist a positive number C and a positive number p < 1, such that sup^^Q \\B^,n ~ fii?,n|| < Cp'\ 
n eM. In particular, e^^ ,, converges to the true innovations e„ = b^o," '^^ exponential rate. 

ii) The sequences are linear functions ofY, i. e. there exist matrix sequences (cij_y)^^j, such that e^,, — 
Y„ + Yj'^=\ Cff,vYn-v The matrices c^ y are uniformly exponentially bounded, i. e. there exist a positive 
constant C and a positive constant p < I, such that sup^^g Ik^.vH ^ Cp^, v e N. 

Proof. We first prove part i) about the uniform exponential approximation of e by s. Iterating the Kalman 
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equations (2.7a) and (2. 13), we find that, for n e N, 

H-l 

fitf,,, =y„ - Ha {F» - KM"-' Xa,i -Yj"^ " K»Yn-v, and 

v=l 

n-1 

fiiJ.n -'^n - (F» - K§H§)" ' Z,j,initiai - ^ {F^ - K§H§y ' K^Y„-y. 

v=l 

Thus, using the fact that, by Lemma 2.2, the spectral radii of F§ - K§H§ are bounded by p < 1, it follows 
that 

sup ||stf_„ - etf,„|| = sup ||H^ (Fi, - K^Haf (A'^.O - ^,?,initial)|| < II^^IIl-O) P""' sup - Z^,initial|| , 

where ||//||z,~(0) := sup^^g \\H»\\ denotes the supremum norm of //(.), which is finite by the Extreme Value 
Theorem. Since the last factor is almost surely finite by assumption, the claim follows. For part ii), we ob- 
serve that Eq. (2.7a) and Lemma 2.2, ii) imply that Eg has the infinite-order moving average representation 
B&,n - Y„ - Hg Yj7=i — KgH^y-' KgY„-y, whose coefficients cg y :- -Hg {Fg - KgHgy-' Kg are uni- 
formly exponentially bounded. Explicitly, ||c,> vl| < II^^IIl~(0) II^IIl'»(0)P" '■ This completes the proof. □ 

Lemma 2.7. Let I£ and I£ he given by Eqs. (2.11) and (2.15). If Assumptions Dl to D3 are satisfied, 
then the sequence L"' sup^^g |^(^,3'^) — ^(i?,_y^)j converges to zero almost surely, as L ^ oo. 

Proof. We first observe that 

L 

\^{^,y'-) - ifd?,^^)! = 2 [isg„ - sg,y V^'eg^„ + 4„y^' (e^,„ - sg,,)]. 

n=l 

The fact that, by Lemma 2.2, iii), there exists a constant C such that < C implies that 



J sup |if (^,/) - if(i?,/)| <jYjP" 



1>E0 ' 



sup||etf,„|| + sup||Stf,„| 

i?e0 i>e0 



(2.22) 



Lemma 2.6, ii) and the assumption that Y has finite second moments imply that Esup^g@ finite. 
Applying Markov's inequahty, one sees that, for every positive e. 



y P \p" sup ||e^,„|| > e < E sup llegAl V — < ' 



because p < 1. The Borel-Cantelli Lemma shows that p" sup^^g Ik*,"!! converges to zero almost surely, 
as « — > oo. In an analogous way one can show that p" sup^^g He^y „|| converges to zero almost surely, and, 
consequently, so does the Cesaro mean in Eq. (2.22). The claim thus follows. □ 

Lemma 2.8. If Assumptions Dl to D4 hold, then, with probability one, the sequence of random functions 
I— > L"'^(i?,_y^) converges, as L tends to infinity, uniformly in i? to the limiting function =S : ^ R 
defined by 

^{&) = <ilog(2;r) + logdet Vg + Eejj V^'si?,i- (2.23) 

Proof. In view of the approximation results in Lemma 2.7, it is enough to show that the sequence of 
random functions i-» L^' ^{{t,y^) converges uniformly to £2. The proof of this assertion is based on the 
observation following Assumption D4 that for each € the sequence eg is ergodic and its consequence 
that, by Birkhoff's Ergodic Theorem (Durrett, 2010, Theorem 6.2.1), the sequence L '^(i?,^^) converges 
to =S(i?) point- wise. The stronger statement of uniform convergence follows from Assumption Dl that is 
compact by an argument analogous to the proof of Ferguson (1996, Theorem 16). □ 
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Lemma 2.9. Assume that Assumptions D3 and D4 as well as the first alternative of Assumption D5 hold. 
If eg I — Eff^ I almost surely, then § — i?o- 

Proof. Assume, for the sake of contradiction, that § + &o. By Assumption D5, there exist matrices Cj e 
Mrf(R), e No, such that, for k| < 1, 

OO 

H» [In - {Fi, - KMz]-' - Ho, [In - (Fs„ - K^,m,zV Ka„ = J] CjzK (2.24) 

j=ia 

where Cjq + 0, for some jq > 0. Using Eq. (2.7b) and the assumed equality of e^ i and e^^^i, this im- 
plies that Of/ = Z^/o CjYj^-j almost surely; in particular, the random variable Cj^Yq is equal to a linear 
combination of the components of Y„, n < 0. It thus follows from the interpretation of the innovations 
sequence e,j„ as linear prediction errors for the process Y that Cj^e^^ q is equal to zero, which implies that 
EC jgE^^flE^ qCJ^ = Cj^V^^C'^^ - Qd- Since V^^ is assumed to be non-singular, this implies that the matrix 
Cjg is the null matrix, a contradiction to Eq. (2.24). □ 

Lemma 2.10. Under Assumptions Dl to D3 and D5, the function ^ : — > R, defined in Eq. (2.23), 
has a unique global minimum at &o. 

Proof. We first observe that the difference e^ i - i is an element of the Hilbert space spanned by the 
random variables {Y„, n < 0), and that e,>„_i is, by definition, orthogonal to this space. Thus, the expectation 
E(e,>,i - E^g i)^ ^^'fii?o,i is equal to zero and, consequently, can be written as 

^(i?) = ^flog(27r) + Es^^ ;y^'s^„,i -i-E(e^,i - etf„,i)^ V^' (s^,i - e^„,i) + logdetV^. 

In particular, since Es^^ i^^J^i^o.i - [^^o '^^^o.i^^q J - it follows that =S(i?o) = logdet + d(l + 
log(2;r)). The elementary inequality x - \ogx > 1, for x > 0, implies that trM - logdetM > d for all 
symmetric positive definite d x d matrices M € S^^(R) with equality if and only if M = l^. Using this 
inequality for M - V^J V^, we thus obtain that, for all € 0, 

^(i?) - ^(#0) + tr [v^'Eetf„,i4^ i] - log det (v^J V^) 

>E(fitf,i - ey?„,i)^ y^' (£tf,i - £^0,0 > 0. 

It remains to argue that this chain of inequalities is in fact a strict inequality if§^ i?o- If + y,>„, the first 
inequality is strict, and we are done. If - Y^^, the first alternative of Assumption D5 is satisfied. The 
second inequality is an equality if and only if e^ \ - e^^^x almost surely, which, by Lemma 2.9, implies that 
= i?o- Thus, the function £2 has a unique global minimum at i?o. □ 

Proof of Theorem 2.4. We shall first show that the sequence L '^(^ ij^), i e N, converges almost 
surely to the deterministic number ^{^q) as the sample size L tends to infinity. Assume that, for some 
positive number e, it holds that sup^^g |L"'^(i?,_y^) - .S(i?)j < e. It then follows that 

L"'^^^,/) < L^'^Xi^o,/) < ^(i^o) + e and L"'^/,/) > ^(/) - e > ^(#0) - e, 

where it was used that ^ is defined to minimize ^{■,y^) and that, by Lemma 2.10, i?o minimizes ^(0. 
In particular, it follows that jL"'.if(^ ,y^) - =S(i?o)| ^ This observation and Lemma 2.8 immediately 
imply that 

¥\-S\&^,y^) > ^(i?o)| > Plsup 

To complete the proof of the theorem, it suffices to show that, for every neighbourhood U of t^q, with 
probability one, § will eventually lie in U. For every such neighbourhood U of #0, we define the real 



-^(§,y'') - m) 



^0=1. 



(2.25) 
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number 6{U) :- infg^Q\^ ^(i?) - ^(#o), which is strictly positive by Lemma 2.10. Then the following 
sequence of inequalities holds: 

P ~ — > i?o) ^¥(vU3Lo:^^ eU VL > L,, j 
>P (vU3Lo : ^(^^) - ^(i?o) < S(U) VL > Lq) 

>p(v;73Lo : ji^'^X^^.J^) - ^(i?o)| < 5(f/)/2 and ji"' Jx^^,^^) - ^(^^)| < 5(f/)/2 VL > Lq) 
The last probability is equal to one by Eq. (2.25) and Lemma 2.8. □ 

2.4. Proof of Theorem 2.5 - Asymptotic normality 

In this section we prove the assertion of Theorem 2.5, that the distribution of L'^^ ^ - i?oj converges to 

a normal random variable with mean zero and covariance matrix H = J^^IJ^^, an expression for which is 
given in Eq. (2.20). 

The idea behind the proof of the asymptotic normality essentially is that the strong mixing property 
implies various central limit theorems. As already said, the QML estimators are intuitively close to moment 
based estimators. So the main task is to show that the central limit results translate into asymptotic normality 
of the estimators. The individual steps in the following again lengthy and technical proof are; 

1 . First we extend the result that the pseudo-innovations obtained via the Kalman filter from the finite 
sample approximate the true pseudo-innovations eg (obtainable from the steady state Kalman filter 
in theory) well to their first and second derivatives; see Lemma 2.11. 

2. The first derivatives of the QL function ^ obtained from the pseudo-innovations Eg have a finite 
variance for every possible parameter see Lemma 2. 12. 

3. Certain fourth moments (viz. covariances of scalar products of the vectors of values of the process 
at different times) of a strongly mixing process with 4 + 6 finite moments can be uniformly bounded 
using the strong mixing coefficients; see Lemma 2. 13. 

4. The covariance matrix of the gradients of the QL function ^ divided by the number of observations 
converges for every possible parameter i?; see Lemma 2.14. 

5. The result that the quasi likelihood (QL) function J^' obtained from the finite sample (via eg) 
converges for the sample size L — > c» uniformly in the parameter space to the true QL function 
^ (obtained from the pseudo-innovations eg) is extended to the first and second derivatives; see 
Lemma 2. 15. 

6. The previous steps allow to show that the QL function ^ at the true parameter divided by the 
number of observations is asymptotically normal with limiting variance determined in step 4; see 
Lemma 2. 16. 

7. The limit of the rescaled second derivative of the QL function ^ at the true parameter exists, equals 
the Fisher information and is invertible; see Lemma 2. 17. 

8. A zeroth order Taylor expansion of the gradient of the QL function Jff divided by the number of 
observations at the true parameter i?o is combined with the asymptotic normality result of step 4 and 
the already established strong consistency of the QML estimator Using the third derivatives of 
the error of the Taylor approximation expressed in terms of second derivatives of ^ is controlled 
and using the result of step 7 the asymptotic normality of the QML estimator is deduced. 

First, we collect basic properties of d„,eg „ and d„,eg „, where d„, - d/d§"' denotes the partial derivative 
with respect to the mth component of i?; the following lemma mirrors Lemma 2.6. 

Lemma 2.11. If Assumptions Dl to D3 and D7 hold, the pseudo-innovations sequences eg and eg defined 
by the Kalman filter equations (2.7a) and (2.13) have the following properties. 

i) If for an integer k e {1, . . . , r), the initial values Xgj„iiiai are such that both sup^^g ||-X^i?,m!7ia/|| and 
^^Vge& \\(^k^§jnitiai\ are almost surely finite, then, with probability one, there exist positive numbers C 
and p < 1, such that sup^^g ||5^e#,„ - 5jte^_„|| < Cp", n 6 N. 
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ii) For each k € {1, . . . , r), the random sequences dkSg are linear functions of Y, i. e. there exist matrix 
sequences (c^',) such that dkE^„ — Yj'^^x c'^\^n-v The matrices c^|, are uniformly exponentially 

bounded, i. e. there exist positive numbers C and p < 1, such that sup^^Q ||''*v|| Cp^,v e N. 

Hi) If for integers k,l € {1, . . . , r), the initial values X^jnuiai are such that sup^^Q ||-yi),m!rifl/||, as well as 
^•^Pi^e© ||f^/-^i?,m/rM/||' ' S and sup^gQ ||'5j/-yi?,imrM;|| are almost surely finite, then, with probability 

one, there exist positive numbers C and p < 1, such that sup^^g ||5^,fi,j_„ — (9^;eiy,„|| < Cp", « e N. 

iv) For each k,l e {I, . . . ,r}, the random sequences d^jS^ are linear functions ofY, i. e. there exist matrix 
sequences (c^'^') ^j, such that d^^e^^n — c^^'^Y„-y. The matrices c^'^^ are uniformly exponentially 
bounded, i. e. there exist positive numbers C and p < 1, such that sup^^g ^ Cp^, v e N. 

Proof. Analogous to the proof of Lemma 2.6, repeatedly interchanging differentiation and summation, and 
using the fact that, as a consequence of Assumptions Dl to D3 and D7, both dk [//^ {Fg - K^HgY^^ 

and d\ , {F^ - K^H§Y'^^ K^^ are uniformly exponentially bounded. □ 

Lemma 2.12. For each #60 and every m — l,...,r, the random variable di„^{'t^,y^) has finite 
variance. 

Proof. The claim follows from Assumption D8, the exponential decay of the coefficient matrices c^ y and 
proved in Lemma 2.6, ii) and Lemma 2. 1 1, and the Cauchy-Schwarz inequality. □ 

We need the following covariance inequality which is a consequence of Davydov's inequality and the 
multidimensional generalization of an inequality used in the proof of Francq and Zakoian (1998, Lemma 
3). For a positive real number a, we denote by [a] the greatest integer smaller than or equal to a. 

Lemma 2.13. Let X be a strictly stationary, strongly mixing d-dimensional stochastic process with finite 
(4 + 6)th moments for some 5 > Q. Then there exists a constant k, such that for all d X d matrices A, B, 
every n 6 Z, A € N, and time indices v, v' € No, jL/,yu' = 0, 1 . . . , LA/2J, it holds that 

Cov [xl_yAX„^y,-Xl^^_^BX„^^^,) < K \\A\\ \\B\\ ax{^\ , (2.26) 

where ax denote the strong mixing coefficients of the process X. 

Proof. We first note that the bihnearity of Cov(-; ■) and the elementary inequality M,; < ||M||, M e M^/(R), 
imply that 

Cov {xlyAX„^yr,Xl^_^BX„,^^,,) ^d' \\A\\ \\B\\ . . max _/ov {xLX_^,^Xl^_^:,^^_^] . 

Since the projection which maps a vector to one of its components is measurable, it follows that X'ij_yXl^_y is 
measurable with respect to ^"~™"'''''' the cr-algebra generated by {X^ '■ -oo < k ^ n - minjv, v')). Simil- 
arly, the random variable X^^^_^X'^^^_^, is measurable with respect to '^,"A_n,ax|/j/j'r Davydov's inequality 
(Davydov, 1968, Lemma 2.1) implies that there exists a universal constant K such that 



Cov {x-_X-y- ; K.A-,xU-,') <K (i 



1 . . |2+,5\l/(2+i) / 




\K-X-.'\ ) (i 





2+a\ 1/(2+5) 





A 






2 


)l 



X [ax (A - max + min {v, v'})] 

(5/(2+i5) 



6/(2+6) 



where it was used that A - max{jj,p'} + min{v, v') > 1^/2], and that strong mixing coefficients are non- 
increasing. By the Cauchy-Schwarz inequality the constant k satisfies 

/ I . . |2+tf\ 1/(2+5) / I |2+5\l/(2+tf) . , 

K^K[E\x:,_X-y] ) (^\K.a-,K.a-,'\ ) <K{E\\X,\f-^y-\ 
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and thus does not depend on «, v, v',ju,yu', A, nor on /, j, s, t. □ 

The next lemma is a muhivariate generalization of Francq and Zakoian (1998, Lemma 3). In the proof 
of Boubacar Mainassara and Francq (201 1, Lemma 4) this generalization is used without providing details 
and, more importantly without imposing Assumption D9 about the strong mixing of Y. In view of the 
derivative terms d,„e§_„ in Eq. (2.28) it is not immediately clear how the result of the lemma can be proved 
under the mere assumption of strong mixing of the innovations sequence e^^. We therefore think that a 
detailed account, properly generalizing the arguments in the original paper (Francq and Zakoian, 1998) to 
the multidimensional setting, is justified. 

Lemma 2.14. Suppose that Assumptions Dl to D3, D8 and D9 hold. Then, for every #6 0, the sequence 
L"' Var V,>^(i?, j^) of deterministic matrices converges to a limit I{d) as L ^ oo. 

Proof. It is enough to show that, for each € 0, and all A:, / = 1 , . . . , r, the sequence of real-valued random 
variables 7^*^', defined by 

converges to a limit as L tends to infinity, where i^^^ - d,„h„ is the partial derivative of the «th term in 
expression (2. 1 1) for ^(§,y^). It follows from well-known differentiation rules for matrix functions (see, 
e. g. Horn and Johnson, 1994, Sections 6.5 and 6.6) that 

= tr [V^' {id - sj,„el„V^')(d„,Vi,)] + 2 (S^s^ J V^'s^,,, (2.28) 

By the assumed stationarity of the processes s^, the covariances in the sum (2.27) depend only on the 
difference n - t. For the proof of the lemma it suffices to show that the sequence c^"^' = Cov ^|'|a^)> 
A 6 Z, is absolutely summable for all k,l = 1 , . . . , r, because then 

;^fa|;(L-|A|)c^f^2C<- (2-29) 

A=-L AeZ 

In view of the of the symmetry c^'^' = "^^-a' restriction to assume that A e N. In order to show that 

2a I'^^'a I is finite, we first use the bilinearity of Cov(-; ■) to estimate 

I <4 |Cov((5,4_„) y^-'e^,„; (5,4„,J V^'e^,,„^)\ 

+ |Cov (tr [v^'s»,nel„Vff'd,Va] ; tr [v^'s»,„+Ael„^^V^'d,Vi,])\ + 

+ 2 |Cov (tr [V^ 'e^,„4„ V^'5,ytf] ; {d,el„^^) 'e^,„+A)| + 

+ 2 |Cov V^-'e^,„; ti- [v^'s^,„,^el^^^V-/diV^])\ . 

Each of these four terms can be analysed separately. We give details only for the first one, the arguments 
for the other three terms being similar Using the moving average representations for e^, dkE^ and die^, it 
follows that 

|Cov {[dkel,) V^'stf,„; {diBl„^^ V^'fi^,„+A)| 

oo 

v,v',/j,/i'=0 

This sum can be split into one part 7^ in which at least one of the summation indices v, v', and p' exceeds 
A/2, and one part I in which all summation indices are less than or equal to A/2. Using the fact that, by 
the Cauchy-Schwarz inequality, 

|Cov(C4f y^-'c^yF„-v'; YU-/!!:^^ V-;c»,,,Y„,^^,)\ < \\V-^f ||c« II \\c»y\\ ||c® ,|| ||c^,,-|| E ||F„ir , 
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it follows from Assumption D8 and the uniform exponential decay of ||cj.v|| and ||c^'^|| proved in Lemma 2.6, 
ii) and Lemma 2. 1 1, ii) that there exist constants C and p < 1 such that 

oo 

r = Yu |c^ov(Fj_,cW/y^-'c,,,,,F„_,,, V^-'c,,,.F„.A-,')| < Cp^^l (2.30) 

v,v',/i,/('=0 
max|v.v'./j,/j')>A/2 

For the contribution from all indices smaller than or equal to A/2, Lemma 2.13 implies that there exists a 
constant C such that 



LA/2J r , 

v,v',/j,/i'=0 L ^ 





A 




ay 1 




)l 


2 





5/(2+a) 



(2.31) 



It thus follows from Assumption D9 that the sequences |c^'^'|, A e N, are summable, and Eq. (2.29) com- 
pletes the proof of the lemma. □ 

Lemma 2.15. Let and I£ he given by Eqs. (2.11) and (2.15). Assume that Assumptions Dl to D3 
and D7 are satisfied. Then the following hold. 

i) For each m - l,...,r, the sequence L"'^^ sup^g@ (i?,3'^) - 5m^(i?,3'^)j converges to zero in 
probability, as L ^ oo. 

ii) For all k,l — 1, . . . , r, the sequence LT^ sup^^g |5^^.if(i?,3'^) - 5^,^(i?,3'^)j converges to zero almost 
surely, as L ^ oo. 

Proof. Similar to the proof of Lemma 2.7. □ 

Lemma 2.16. Under Assumptions Dl, D3 and D7 to D9, the random variable L '^^V,j^(i?o,3'^) 
asymptotically normally distributed with mean zero and covariance matrix /(i^o)- 

Proof. Because of Lemma 2.15, i) it is enough to show that L^^^'V^^ (tf(),y^^ is asymptotically normally 
distributed with mean zero and covariance matrix /(i?o)- First, we note that 

L 

d^^i^y) = Y {tr [V^' [h - V^') diV^] + 2 (5,fiJ„) V^'s^,,,} , (2.32) 

«=i 

which holds for every component / = 1, . . . , r. The facts that Ee^„,„e^^ equals V^j^, and that is ortho- 
gonal to the Hilbert space generated by {¥,, t < n), of which 5,e^^^ is an element, show that E5/Jf (i?o,J^) - 
0. Using Lemma 2.6, ii), expression (2.32) can be rewritten as 

«=1 «-l 

where, for every m e N, the processes yIIj and Z^'/ are defined by 

m 
v,v'=0 

(2.33a) 

2(0 =//('■) + y(') (2 33b) 

and 



oo oo 



v=0 v'=m+l 
oo m 

= 2 2{-tr[y^;c^„,,y„_,y;^_,,4,,,y^;(a,y»„)l + 2y„%c«';y^-;c^„,,,y„_, 



v-m+1 v'-O 
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It is convenient to also introduce the notations 



V - ( yd) y*--) "1^ and r -( 7<" 7*''' f 

^m,n — y J^m.n '-m,n j -Om,H — ^m,n ^m.n ) 



(2.34) 



The rest of the proof proceeds in three steps: in the first we show that, for each natural number m, the se- 
quence L"'^- Yjii [^in,n - EJ/,„_„] is asymptotically normally distributed with asymptotic co variance matrix 
/,„, and that /,„ converges to /(i?o) as m tends to infinity. We then prove that L"'^^ 2n VZ,m,n - ^Zm,n] goes 
to zero uniformly in L, as m — > oo, and the last step is devoted to combining the first two steps to prove the 
asymptotic normality of L^'^^V^^ (i^cj^). 

Step 1 Since Y is stationary, it is clear that J/„, is a stationary process. Moreover, the strong mixing 
coefficients aj/..^^) of J/„, satisfy Q'j/,„(A:) < ay(max{0, A; - m}) because J/„,_„ depends only on the finitely 
many values F„_m, ■■.,¥„ of Y (see Bradley, 2007, Remark 1.8 b)). In particular, by Assumption D9, the 
strong mixing coefficients of the processes J/,„ satisfy the summability condition 2/.[q'j/,„(A:)]''^*-^** < oo. 
Since, by the Cramer-Wold device, weak convergence of the sequence L^^^~Yjn=i [■^m,n - EJ/„,_„] to a 
multivariate normal distribution with mean zero and covariance matrix S is equivalent to the condition that, 
for every vector u e W, the sequence L^^^^u^ I]^=i {^m.n - EJ/,,,,,,] converges to a one-dimensional normal 
distribution with mean zero and variance u^'Lu, we can apply the Central Limit Theorem for univariate 
strongly mixing processes (Ibragimov, 1962, Theorem 1.7) to obtain that 

y W,n,n - EJ/„,,„] .A^iOr, /„,), whcrc /„, = V Cov (J/„,,„; J/,„,„+a) . (2.35) 

^'^ n=\ AeZ 

The claim that /,„ converges to /(i?o) will follow if we can show that 
Cov (y« ; J . Cov (4« ; J , 



VAe 



(2.36) 



and that |Cov (I'm,],; jf,f„+A)| dominated by an absolutely summable sequence. For the first condition, we 
note that the bilinearity of Cov(-; ■) implies that 



Cov (}^« ; y*" J - Cov (Cf ; J = Cov (y«> ; y« - c"^ J + Cov (}^« - tf ; A" J . 

These two terms can be treated in a similar manner so we restrict our attention to the second one. The 
definitions of „ (Eq. (2.33a)) and ^^^^^ (Eq. (2.27)) allow us to compute 

v.v' 
max{v,v')>;j; 

As a consequence of the Cauchy-Schwarz inequaUty, Assumption D8 and the exponential bounds in 
Lemma 2.6, i), we therefore obtain that Nsi{Ym]n — ^Z,^ ^ Cp'" independent of n. The L^-continuity of 
Cov(-; ■) thus implies that the sequence Cov (yII^^„ - {^^^^', ^j^^) converges to zero as m tends to infinity at 

an exponential rate uniformly in A. The existence of a summable sequence dominating |cov (y^],; J'®„_|_a)| 
is ensured by the arguments given in the proof of Lemma 2.14, reasoning as in the derivation of Eqs. (2.30) 
and (2.31). 



Step 2 We shall show that there exist positive constants C and p < 1, independent of L, such that 

L 



trVar 



«=i 



<Cp"', Z„,,„ given in Eq. (2.34). 



(2.37) 



Since 



trVar 



1 



<2 



trVar 



1 



-H tr Var 



1 

n=l 



(2.38) 



18 



E. Schlemm and R. Stelzer 



it suffices to consider the latter two terms. We first observe that 

'i=|]%,„]4tr |]Cov('l/„,,„;<Z/„,,„,) = ;|-2 Z " |A|) n*iU ^ ^ |u(^;^| , 
«=i 



trVar 



1 



A-,/=l A=-L+l 



A,/=l AeZ 



(2.39) 



where 



:Covff/«-C/'" ) 



As before, under Assumption D8, the Cauchy-Schwarz inequality and the exponential bounds for ||c,j„ , 



and Ik^Ml imply that |u),^'^| < Cp'". By arguments similar to the ones used in the proof of Lemma 2.13 



Davydov's inequality implies that, for m < LA/2J, 

CO CO [A/2J 





A 






2 


)1 



6/{2+6) 



v=0 v'-m+l /j,/i'-0 L \L J / J 1/ ,/'-n 1/ 



v,v'=0 

max(/i,/j'(>LA/2J 



<Cp" 



*/(2+[5) 



A/2 I 



It thus follows that, independent of the value of k and I, 



\ 



z I'd = z Id + z i«"i'i < m + ^ [.„ (A)i««« 

A=0 A=0 A=2m+1 I A=0 ) 

and therefore, by Eq. (2.39), that tr Var [l-^'^ ^Li "Km.n) < Cp"'. In an analogous way one also can show 
that tr Var ^i^^ r^/^^^ ^ "J ^ ^p'", and thus the claim (2.37) follows with Eq. (2.38). 



Step 3 In step 1 it has been shown that L 2„ [J/m,« - EJ/„,.„] 



,yViOr, Im), and that /,„ converges 



to /(i?o), as m — > oo. In particular, the limiting normal random variables with covariances /„, converge 
weakly to a normal random variable with covariance matrix !(&{))■ Step 2 together with the multivariate 
Chebyshev inequahty implies that, for every e > 0, 



lim limsupl 



< lim lim sup — tr Var 



1 1 ^ 

(1=1 



> e 



< hm — p" 



Proposition 6.3.9 of Brockwell and Davis (1991) thus completes the proof. 



A very important step in the proof of asymptotic normality of QML estimators is to establish that the 
Fisher information matrix J, evaluated at the true parameter value, is non-singular We shall now show 
that Assumption D 10 is sufficient to ensure that 7"' exists for linear state space models. For vector ARMA 
processes, formulae similar to Eqs. (2.40) below have been derived in the literature (see, e.g., Klein, Melard 
and Saidi, 2008 ; Klein and Neudecker, 2000); in fact, the resultant property of the Fisher information matrix 
of a vector ARMA process implies that J in this case is non-singular if and only if its autoregressive and 
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moving average polynomials have no common eigenvalues (Klein, Melard and Spreij, 2005). In conjunction 
with the equivalence of linear state space and vector ARMA models this provides an alternative way of 
checking that J in non-singular. We continue to work with Assumption DIO, however, because it avoids 
the transformation of the state space model (2. 13) into an equivalent ARMA form. 

Lemma 2.17. Assume that Assumptions Dl to D4, D7 and DIO hold. With probability one, the matrix 
J — lim/,^co L^' V^^(i?o,3'^) exists and is non-singular 

Proof. It can be shown as in the proof of Boubacar Mainassara and Francq (201 1, Lemma 4) that J exists 
and is equal Xo J - Ji + J2, where 

7i = 2E [iV^e^,,y y^-; (VtfS^„,i)] and J2 = (tr [y-^J' (5,y^„) V^^ {djV^^) y-^J']).. . (2.40) 

J2 is positive semidefinite because it can be written as ^2 = ( ■ • ■ ftr ) ( fti • ■ • ftr ), where 
bin - (y^^^^ ®V^^^^^vec{di„V^J. Since Ji is positive semidefinite as well, proving that J is non-sin- 
gular is equivalent to proving that for any non-zero vector c € W, the numbers c^JjC, i - 1,2, are 
not both zero. Assume, for the sake of contradiction, that there exists such a vector c = (ci, . . . ,Cr)^. 
The condition c^JiC implies that, almost surely, Ckdue^^.,, - ^d, for all « e Z. It thus follows that 
Z^i Tfk=\ idk-^ao,v)^»o,-v - ^d, where the Markov parameters .y^g^y are given by ^^^y = -H^py^K^, 

V > 1. Since the sequence is uncorrected with positive definite covariance matrix, it follows that 
2JLi Q (5*^#„,v) = 0^, for every v e N. Using the relation vec(ABC) = {c''^ (8)A)vecB (Bernstein, 2005, 
Proposition 7.1.9), we see that the last display is equivalent to V,> ® ^i?o] ^^'^ ^tfo ') ^ ~ ^d- for every 

V e N. The condition J2C - implies that (V^ vec V^J c - 0^2. By the definition of i^^j in Eq. (2.18) 
it thus follows that W^if/g^jC = 0(^+2)^2, for every j 6 N, which, by Assumption DIO, is equivalent to the 
contradiction that c - Or- □ 



Proof of Theorem 2.5. Since the estimate ^ converges almost surely to ^0 by the consistency result 

proved in Theorem 2.4, and §0 is an element of the interior of by Assumption D6, the estimate & 
is an element of the interior of eventually almost surely. The assumed smoothness of the parametrization 

(Assumption D7) implies that the extremal property of # can be expressed as the first order condition 

Vtf^(i? ,y^) = 0,. A Taylor expansion of i-> gj^{§,y^) around the point i?o shows that there exist 

parameter vectors 1?, 6 of the form = + c,(# - §q), < c, < 1, such that 



(2.41) 



where V^^(^^,3'^) denotes the matrix whose ith row. 



1, 



, r, is equal to the ;th row of V^^(i?,-,3'^). 



By Lemma 2.16 the first term on the right hand side converges weakly to a multivariate normal random 
variable with mean zero and covariance matrix / - As in Lemma 2.8 one can show that the sequence 

& L"' V^^(i?,3'^), L 6 N, of random functions converges almost surely uniformly to the continuous 
function § i-> V^i?(i?) taking values in the space R''^'^*^''. Since on the compact space this function is 
bounded in the operator norm obtained from identifying R*^^'^'' with the space of linear functions from W 
to M,.(R), that sequence is almost surely uniformly bounded, and we obtain that 



i-V2i^(^^/)-lv2if(,?0,/ 



< sup 



iv^if(,?,/) 



0, 



because, by Theorem 2.4, the second factor almost surely converges to zero as L tends to infinity. It follows 
from Lemma 2.17 that L ' V^^(^^,3'^) converges to the matrix J almost surely, and thus from Eq. (2.41) 

that L as L — > 00. This shows Eq. (2. 19) and completes the proof. □ 
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3. Quasi maximum likelihood estimation for multivariate 
continuous-time ARMA processes 

In this section we pursue the second main topic of the present paper, a detailed investigation of the asymp- 
totic properties of the QML estimator of discretely observed multivariate continuous-time autoregressive 
moving average processes. We will make use of the equivalence between MCARMA and continuous-time 
linear state space models, as well as of the important observation that the state space structure of a con- 
tinuous-time process is preserved under equidistant sampling, which allows for the results of the previous 
section to be applied. The conditions we need to impose on the parametrization of the models under con- 
sideration are therefore closely related to the assumptions made in the discrete-time case, except that the 
mixing and ergodicity assumptions D4 and D9 are automatically satisfied (Marquardt and Stelzer, 2007, 
Proposition 3.34). 

We start the section with a short recapitulation of the definition and basic properties of Levy-driven con- 
tinuous-time ARMA processes and their equivalence to state space models (based mainly on Marquardt 
and Stelzer (2007); Schlemm and Stelzer (2012)). Thereafter we work towards being able to apply our 
results on QML estimation for discrete time state models to QML estimators for MCARMA processes 
culminating in our main result Theorem 3. 16. To this end we first recall the second order structure of con- 
tinuous time state space models and provide auxiliary results on the transfer function in Section 3.2. This is 
followed in Section 3.3 by recalling that equidistant observations of an MCARMA processes follow a state 
space model in discrete time, as well as discussions of the minimality of a state space model and of how to 
make the relation between the continuous and discrete time state space models unique. The following Sec- 
tion 3.4 looks at the second-order properties of a discretely observed MCARMA process and the aliasing 
effect. Together the results of Sections 3.2 to 3.4 allow to give accessible identifiability conditions needed 
to apply the QML estimation theory developed in Section 2. Finally, Section 3.5 introduces further tech- 
nical assumptions needed to employ the theory for strongly mixing state space models and then derives 
our main result about the consistency and asymptotic normality of the QML estimator for equidistantly 
sampled MCARMA processes in Theorem 3.16. 

3.1. Levy-driven multivariate CARMA processes and continuous-time state space 
models 

A natural source of randomness in the specification of continuous-time stochastic processes are Levy pro- 
cesses. For a thorough discussion of these processes we refer the reader to the monographs Applebaum 
(2004); Sato (1999). 

Definition 3.1. A two-sided W" -valued Levy process (L(f))/eR is a stochastic process, defined on a prob- 
ability space (f2, P), with stationary, independent increments, continuous in probability, and satisfying 
L(0) = 0„ almost surely. 

The characteristic function of a Levy process L has the Levy-Khintchine-form Ee'^"-^*"^ = exp{fi^^(H)), 
H G R'", / G R^, where the characteristic exponent ifr^ is given by 

(A'-(«) = i</,«>-^<«,2^«>+ r [e'<"-->-l-i<H,.«:>/(|wi<illv^(d.«:). (3.1) 

The vector G W" is called the drift, lP is a non-negative definite, symmetric m x m matrix called 
the Gaussian covariance matrix, and the Levy measure satisfies the two conditions v^({0„,)) — and 
min(||jc||" , l)v^(da:) < oo. For the present purpose it is enough to know that a Levy process L has finite 
^th absolute moments, k > 0, that is E||L(f)||'' < if and only if f HjcH*^ v^(dA:) < oo (Sato, 1999, 
Corollary 25.8), and that the covariance matrix of L(l), if it exists, is given by iP + xx^v^{dx) 
Sato (1999, Example 25.11). 

Assumption LI. The Levy process L has mean zero and finite second moments, i. e. -H J|jj.||>[ xv^{Ax^ 
is zero, and the integral J|jj.||>[ ll^^ll^ v^iAx) is finite. 
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Just like i. i. d. sequences are used in time series analysis to define ARMA processes, Levy processes 
can be used to construct (multivariate) continuous-time autoregressive moving average processes, called 
(M)CARMA processes. If L is a two-sided Levy process with values in R'" and p > q are integers, the 
(/-dimensional L-driven MCARMA(/:), q) process with autoregressive polynomial 



...+ApeMdm.z]) 



and moving average polynomial 

z ^ Qiz) Boz" + Biz"-^ 



Bq e Md,m 



(3.2a) 



(3.2b) 



is defined as the solution to the formal differential equation P(D)y(f) = 2(D)DL(f), D = (d/df). It is often 
useful to allow for the dimensions of the driving Levy process L and the L-driven MCARMA process to be 
different, which is a slight extension of the original definition of Marquardt and Stelzer (2007). The results 
obtained in that paper remain true if our definition is used. In general, the paths of a Levy process are 
not dififerentiable, so we interpret the defining differential equation as being equivalent to the state space 
representation 

dG(t) ^ JlG(t)dt + SdLit), Y(t) ^ CG(t), f e R, (3.3) 
where ^ ,S, and C are given by 

Id ... 







-A. 







-Ap-i 



1. 




Id 



e MpdCR), 



pd,m\ 



Pp-j - -I{0.....q]U) 



P-J-i 



C=(UO,...,0)eM^,;,d(R). 



(3.4a) 



(3.4b) 



(3.4c) 



It follows from representation (3.3) that MCARMA processes are special cases of linear multivariate con- 
tinuous-time state space models, and in fact, the class of linear state space models is equivalent to the class 
of MCARMA models (Schlemm and Stelzer, 2012, Corollary 3.4). By considering the class of linear state 
space models, one can define representations of MCARMA processes which are different from Eq. (3.3) 
and better suited for the purpose of estimation. 

Definition 3.2. A continuous-time linear state space model (A, B,C, L) of dimension N with values in 
is characterized by an W^-valued driving Levy process L, a state transition matrix A e Mn(B^), an 
input matrix B e Mai_,„(R), and an observation matrix C € Mj_a'(R). It consists of a state equation of 
Ornstein-Uhlenbeck type 

dZ(f) = AZ(f)df + BdL(f), t € R, (3.5a) 

and an observation equation 

Y{t) = CX{t), t e R. (3.5b) 
The -valued process X — (X(f)),£]j is the state vector process, and Y — (F(f)),j:]g the output process. 

A solution Y to Eq. (3.5) is called causal if, for all f, Y{t) is independent of the cr-algebra generated by 
{L(s) : s > /). Every solution to Eq. (3.5a) satisfies 



X{t) 



X{s) + 



,A(t-u) 



BdL(u), V,?, t 6 



s <t. 



(3.6) 



The following can be seen as the multivariate extension of Brockwell, Davis and Yang (2011, Proposition 
1) and recalls conditions for the existence of a stationary causal solution of the state equation (3.5a) for 
easy reference. We always work under the following assumption. 



Assumption E. The eigenvalues of the matrix A have strictly negative real parts. 
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3.2. Second order structure and the transfer function 

Propositions.! (Sato and Yamazato (1983, Theorems. 1)). If Assumptions E and LI hold, thenEq. (3.5a) 
has a unique strictly stationary, causal solution X given by X(t) = /'^ e'^('-">BdL(M). Moreover, X(t) has 
mean zero and second-order structure 

Var(Z(f)) =:ro = e'*"BS^B^e'^ "dw, (3.7a) 
Jo 

Cov (Z(f + h), Z(r)) -.jYih) = e'^'To, h > 0, (3.7b) 
where the variance Fq satisfies AVq + TqA^ — —BY,^B^. 

It is an immediate consequence that the output process Y has mean zero and autocovariance function 
R 3 /j i-> yrih) given by jyih) - Cc'^'TqC^, h > 0, and that Y itself can be written succinctly as a 
moving average of the driving Levy process as Y{t) - g(t — u)dL(u), where g{t) - Ce'^'B/[o,co)(0- This 
representation shows that the behaviour of the process Y depends on the values of the individual matrices 
A, B, and C only through the products Cs^'B, f 6 R. The following lemma relates this analytical statement 
to an algebraic one about rational matrices, allowing us to draw a connection to the identifiability theory of 
discrete-time state space models. 

Lemma 3.2. Two matrix triplets (A, B, C), (A, B, C) of appropriate dimensions satisfy Cs^'B — C&^'Bfor 
allteR if and only ifCizl - A)-^B = C(zl - Ay^Bfor all zeC. 

Proof. If we start at the first equality and replace the matrix exponentials by their spectral representations 
(see Lax, 2002, Theorem 17.5), we obtain ^ e~'C{zl - Ay^Bdz = e~'C(zl - Ay^Bdz, where y is a closed 
contour in C winding around each eigenvalue of A exactly once, and likewise for f. Since we can always 
assume that y = y by taking y to be times the unit circle, R > max{\A\ : A e cr^ U cr^ ),it follows that, 
for each t 6 R, J^e~' [C(zl - A) 'B - C(zl - A)"'Bjdz - 0. Since the rational matrix function A(z) = 
C(zl - A) 'Z? - C(zl - A) 'B has only poles with modulus less than R, it has an expansion around infinity, 
^(z) = "' ^ MdiC), which converges in a region {z G C : |z| > r) containing y. Using the fact 

that this series converges uniformly on the compact set y and applying the Residue Theorem from complex 
analysis, which implies j^e^'z^"dz - t"/n\, one sees that 2,7^0 fr^/i+i = O^v. Consequently, by the Identity 
Theorem, A„ is the zero matrix for all « > 1, and since A{z) — > as z — > oo, it follows that A(z) s 0^/,,,,. □ 

The rational matrix function H : z C(z1n - A)"'B is called the transfer function of the state space 
model (3.5) and is closely related to the spectral density fy of the output process Y, which is defined as 
/y(w) = e""^''yy(/2)d/2 - the Fourier transform of yy. Before we make this relation explicit, we prove the 
following lemma. 

Lemma 3.3. For any real number v, and matrices A, B, E^, Fq as in Eq. (3.7a), it holds that 

QAuQ^LgT^Ah,^^ = e^'^Toe-'*"". (3.8) 

Proof. We define functions Z, r : R -> Mn(R) by l{v) = e'*"BS^B^e'^'^"dM and r(v) = e-'^Toe"'*'^". Both 
Z : V i-> l{v) and r : v i-> r(v) are differentiable functions of v, satisfying 

— Kv) =e~^''SS^B^e-'*'" and — r(v) = -Ae-'^Toe-'*"'' - e-'^ToA^e-'^''". 
dv dv 



Using Proposition 3.1 one sees immediately that (d/dv)/(v) - (d/dv)r(v), for all v € R. Hence, / and r 
differ only by an additive constant. Since /(O) equals r(0) by the definition of Fq, the constant is zero, and 
/(v) = r(v) for all real numbers v. □ 
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Proposition 3.4. Let Y be the output process of the state space model (3.5), and denote by H : z ^ 
C(zljv — AY^B its transfer function. Then the relation fyicj) — {2nY^ Hi^ito)!/^ H{-\u)Y holds for all real 
to; in particular, co i— > frico) is a rational matrix function. 



Proof. First, we recall (Bernstein, 2005, Proposition 11. 2. 2) that the Laplace transform of any matrix A is 
given by its resolvent, that is, {zl - A)"' = e"'"e'^"dM, for any complex number z. We are now ready to 
compute 



In In 



f 

Jo 



e-'""e'*"dMB2^B^ 



f 

Jo 



dhC' 



Introducing the new variable h - u - v, and using Lemma 3.3, this becomes 



2 j y-»CX3 y-.00 

2;r'^[Jo Jo 

—C\ e-'^'^e-^'Tod/i + 

27T I Jo J-oc 



Jo J-v 



By Eq. (3.7b) and the fact that the spectral density and the autoco variance function of a stochastic process 
are Fourier duals of each other, the last expression is equal to (27r)"' e"'"'''>'y(/z)d/i - fyioj), which 
completes the proof. □ 



A converse of Proposition 3.4, which will be useful in our later discussion of identifiability, is the 
Spectral Factorization Theorem. Its proof can be found in Rozanov (1967, Theorem 1.10.1). 

Theorem 3.5. Every positive definite rational matrix function / e {C[u)]) of full rank can be factorized 
as f(oj) — (27r)"'W(iw)W(— iw)^, where the rational matrix function z i— > W(z) e Md^N (K-k)) has full rank 
and is, for fixed N, uniquely determined up to an orthogonal transformation W(z) W(z)0, for some 
orthogonal N X N matrix O. 



3.3. Equidistant observations 

We now turn to properties of the sampled process Y^''^ - (Fi''^)„ez which is defined by F^f^ - Y(nh) and 
represents observations of the process F at equally spaced points in time. A very fundamental observation is 
that the linear state space structure of the continuous-time process is preserved under sampling, as detailed 
in the following proposition. Of particular importance is the explicit formula (3.10) for the spectral density 
of the sampled process F'-'''. 

Proposition 3.6 (partly Schlemm and Stelzer (2012, Lemma 5.1)). Assume that Y is the output process 
of the state space model (3.5). Then the sampled process F"'^ has the state space representation 

Z„ = e'*''Z„_i + Aff , iVf = ["'' e^"'''-"'BdL(M), F^'" = CZf . (3.9) 

The sequence (N^J^^^ is i. i. d. with mean zero and covariance matrix = s^"ElJ"B^ s^^"du. Moreover, 
the spectral density of F**', denoted by fy'K is given by 

ff\u) = C (e-1^ - e^y ' (e— 1^ - e^'")" ' C^; (3.10) 

in particular, fy'' : [—n, n] — > (e. {e"^|) is a rational matrix function. 

Proof. The first part is Schlemm and Stelzer (2012, Lemma 5.1) and Expression (3.10) follows from 
Hamilton (1994, Eq. (10.4.43)). □ 
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In the following we derive conditions for the sampled state space model (3.9) to be minimal in the sense 
that the process F*''' is not the output process of any state space model of dimension less than A^, and 
for the noise covariance matrix to be non-singular. We begin by recalling some well-known notions 
from discrete-time realization and control theory. For a detailed account we refer to Astrom (1970); Sontag 
(1998), which also explain the origin of the terminology. 

Definition 3.3. Let H e M^/_„,(R{z)) be a rational matrix function. A matrix triple (A, B, C) is called an 
algebraic realization of H of dimension N if H{z) — C{z\n ~ AY^B, where A e M^iW), B e MAr,,„(M.), and 
C e Md,w(R). 

Every rational matrix function has many algebraic realizations of various dimensions. A particularly 
convenient class are the ones of minimal dimension, which have a number of useful properties. 

Definition 3.4. Let H e M,/ „,(R{z)) be a rational matrix function. A minimal realization of H is an 
algebraic realization of H of dimension smaller than or equal to the dimension of every other algebraic 
realization ofH. The dimension of a minimal realization of H is the McMillan degree ofH. 

Two other important properties of algebraic realizations, which are related to the notion of minimality 
and play a key role in the study of identifiability, are introduced in the following definitions. 

Definition 3.5. An algebraic realization (A, B, C) of dimension N is controllable if the controllability 
matrix = [ B AB ■ ■ ■ A""'B j € M„^„n(^) has full rank. 

Definition 3.6. An algebraic realization (A, B, C) of dimension N is observable if the observability matrix 
ff=[C^ (CAf ■■■ (CA"-^f f eMdN.Ni'^) has full rank. 

We will often say that a state space system (3.5) is minimal, controllable or observable if the corres- 
ponding transfer function has this property. In the context of ARMA processes these concepts have been 
used to investigate the non-singularity of the Fisher information matrix (Klein and Spreij, 2006). The next 
theorem characterizes minimality in terms of controllability and observability. 

Theorem 3.7 (Hannan and Deistler (1988, Theorem 2.3.3)). A realization (A, B, C) is minimal if and only 
if it is both controllable and observable. 

Lemma 3.8. For all matrices A e Mm(R), B e Mai,„,(R), 2 e S^^(R), and every real number t > 0, the 
linear subspaces im ^B, AB, . . . , A'*'"' fij and im j^' e'*"BSB^e'*'"dM are equal. 

Proof. The assertion is a straightforward generalization of Bernstein (2005, Lemma 12.6.2). □ 

Corollary 3.9. If the triple (A, B, C) is minimal of dimension N, and E is positive definite, then the N x N 
matrix % = J^'^ e"^" BI,B^ e"^' " du has full rank N. 

Proof. By Theorem 3.7, minimality of (A, B, C) implies controllability, and by Lemma 3.8, this is equival- 
ent to X having full rank. □ 

Proposition 3.10. Assume that Y is the d-dimensional output process of the state space model (3.5) with 
(A, B, C) being a minimal realization of McMillan degree N. Then a sufficient condition for the sampled 
process F^''* to have the same McMillan degree, is the Kalman— Bertram criterion 



A-A'i^ 2/r'7ri/t, 



V(/l, A') e cr(A) X cr(A), 



V;t e Z\{0). 



(3.11) 
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Proof. We will prove the assertion by showing that the A^-dimensional state space representation (3.9) 
is both controllable and observable, and thus, by Theorem 3.7, minimal. Observability has been shown 
in Sontag (1998, Proposition 5.2.11) using the Hautus criterion (Hautus, 1969). The key ingredient in the 
proof of controllability is Corollary 3.9, where we showed that the autocovariance matrix of N\'^\ given 
in Proposition 3.6, has full rank; this shows that the representation (3.9) is indeed minimal and completes 
the proof. □ 

Since, by Hannan and Deistler (1988, Theorem 2.3.4), minimal realizations are unique up to a change 
of basis (A, B, C) i-> {TAT^^, TB, Cr^'), for some non-singular A^xA^ matrix T, and such a transformation 
does not change the eigenvalues of A, the criterion (3.1 1) does not depend on what particular triple (A, B, C) 
one chooses. Uniqueness of the principal logarithm implies the following. 

Lemma 3.11. Assume that the matrices A, B e MnCR.) satisfy e''"* = e''^ for some h > 0. If the spectra 
cta, ctb of a, B satisfy \ lmA\ < n/hfor all A & ctaVJ erg, then A — B. 

Lemma 3.12. Assutne that A € Mn{W) satisfies Assumption E. For every h > 0, the linear map ^ ; 
Ma,(R) -» Miv(R), s^"Me^''"du is injective. 

Proof. If we apply the vectorization operator vec : M/v(R) — > R'*'' and use the well-known identity (Bern- 
stein, 2005, Proposition 7.1.9) vec(t/yW) = (W^ ® C/)vec(y) for matrices U,V and W of appropriate 
dimensions, we obtain the induced linear operator 

vec oj/ o vec"' : R'^' -> R'^' , vec M i-> I e'^" ® e'^"dM vec M. 



Jo 



To prove the claim that the operator ./# is injective, it is thus sufficient to show that the matrix £/ :- 
is non-singular. We write A © A := A ® Iai + Iai igi A. By Bernstein (2005, Fact 

11.14.37), £/ = J^\'^®^>"du and since o-{A ® A) ^ {A + : A,iJ e cr{A)} (Bernstein, 2005, Proposition 
7.2.3), Assumption E implies that all eigenvalues of the matrix A ©A have strictly negative real parts; in 
particular, A ©A is invertible. Consequently, it follows from Bernstein (2005, Fact 11.13.14) that si/ - 
(A ©A)-' j^e*-^®-^"' - 1^,2]. Since, for any matrix M, it holds that o-(e'^) = {e'^,A e cr(M)} (Bernstein, 2005, 
Proposition 11.2.3), the spectrum of e*'*®'^*'' is a subset of the open unit disk, and it follows that £/ is 
invertible. □ 



3.4. Overcoming the aliasing effect 

One goal in this paper is the estimation of multivariate CARMA processes or, equivalently, continuous- 
time state space models, based on discrete observations. In this brief section we concentrate on the issue of 
identifiability, and we derive sufficient conditions that prevent redundancies from being introduced into an 
otherwise properly specified model by the process of sampling, an effect known as aliasing (Hansen and 
Sai-gent, 1983). 

For ease of notation we choose to parametrize the state matrix, the input matrix, and the observation 
matrix of the state space model (3.5), as well as the driving Levy process L; from these one can always 
obtain an autoregressive and a moving average polynomial which describe the same process by applying 
a left matrix fraction decomposition to the corresponding transfer function We hence assume that there 
is some compact parameter set c R'', and that, for each & e &, one is given matrices A#, B^ and 
of matching dimensions, as well as a Levy process L^. A basic assumption is that we always work with 
second order processes (cf. Assumption LI). 

Assumption CI. For each i? e 0, it holds that EL^ = 0„„ that E ||L,j(l)|p is finite, and that the covariance 
matrix 2^ = EL^{l)L^{l)^ is non-singular 

To ensure that the model corresponding to & describes a stationary output process we impose the ana- 
logue of Assumption E. 
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Assumption C2. For each 6 0, the eigenvalues of have strictly negative real parts. 

Next, we restrict the model class to minimal algebraic realizations of a fixed McMillan degree. 

Assumption C3. For all i? e ©, the triple (A^, Bg, C^) is minimal with McMillan degree N. 

Since we shall base the inference on a QML approach and thus on second-order properties of the ob- 
served process, we require the model class to be identifiable from these available information according to 
the following definitions. 

Definition 3.7. Two stochastic processes, irrespective of whether their index sets are continuous or dis- 
crete, are L^-observationally equivalent if their spectral densities are the same. 

Definition 3.8. A family (Yg, ^ € 0) of continuous-time stochastic processes is identifiable from the spec- 
tral density if, for every i?i 4^ i?2, the two processes and F^, are not Lr -observationally equivalent. It 
is h-identifiable from the spectral density, h > 0, if, for every &\ + #2, the two sampled processes F^* and 
y^'' are not L? -observationally equivalent. 

Assumption C4. The collection of output processes K(&) :— (Y^, e 0) corresponding to the state space 
models {A§, B§, Cg, L§) is identifiable from the spectral density. 

Since we shall use only discrete, /j-spaced observations of Y, it would seem more natural to impose the 
stronger requirement that K{&) be /j-identifiable. We will see, however, that this is implied by the previous 
assumptions if we additionally assume that the following holds. 

Assumption C5. For all & e @, the spectrum ofA^ is a subset of{z e C : -71/h < Imz < n/h}. 

Theorem 3.13 (Identifiability). Assume that d 1? 1— > ^Ag, B^, C^, 2^) is a parametrization of continu- 
ous-time state space models satisfying Assumptions CI to C5. Then the corresponding collection of output 
processes K(&) is h-identifiable from the spectral density. 

Proof. We will show that for every , #2 £ ®, + ^1, the sampled output processes F^'* and Y'^^^Qi) are 
not L^-observationally equivalent. Suppose, for the sake of contradiction, that the spectral densities of the 
sampled output processes were the same. Then the Spectral Factorization Theorem (Theorem 3.5) would 
imply that there exists an orthogonal N xN matrix O such that 

Q,(e-1^ - e^*.")^^;'''''0 = Q,(e-1^ - e^'>.'^)%W.i/2 ^ -n < c < jt, 

where are the unique positive definite matrix square roots of the matrices e'^''i"Bff.'L^ B^ e^''i"du, 

defined by spectral calculus. This means that the two triples 

(e^^.'\?:^f'^^0,Q,) and {e^^2'\%'^^-''\c^,) 

are algebraic realizations of the same rational matrix function. Since Assumption C5 clearly implies the 
Kalman-Bertram criterion (3. 1 1), it follows from Proposition 3. 10 in conjunction with Assumption C3 that 
these realizations are minimal, and hence from Hannan and Deistler (1988, Theorem 2.3.4) that there exists 
an invertible matrix T e Mn(E.) satisfying 

^A,^h ^j-l^A,Jrj^ ^ih),\il^^j-X^(h),\l2^ ^Crf^r. (3.12) 

It follows from the power series representation of the matrix exponential that T'^d^^^-'^T equals e^ ''^dt^''. 
Under Assumption C5, the first equation in conjunction with Lemma 3.11 therefore implies that A^^ - 
T^^A^^T. Using this, the second of the three equations (3.12) gives 

^ r"e'^^.«(r-'B^,)s^^(r->B^,)%^^."d«, 

\J 
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which, by Lemma 3.12, implies that (r '%)E^ (r 'B^,,)^ = B^,,!^ . Together with the last of the 
equations (3.12) and Proposition 3.6 it follows that /^^ = /^,, which contradicts Assumption C4 that 
and Y^^ are not L^-observationally equivalent. □ 



3.5. Asymptotic properties of the QML estimator 

In this section we apply the theory that we developed in Section 2 for the QML estimation of general 
discrete-time linear state space models to the estimation of continuous-time linear state space models or, 
equivalently, multivariate CARMA processes. We have already seen that a discretely observed MCARMA 
process can be represented by a discrete-time state space model and that, thus, a parametric family of 
MCARMA processes induces a parametric family of discrete-time state space models. Eqs. (3.9) show that 
sampling with spacing h maps the continuous-time state space models (A^, B^, C^, L^)^^q to the discrete- 
time state space models 

(e^^",Q,A^^",0)^^^, A^^'j,- r" e^*"BtfdLtf(H). (3.13) 

which are not in the innovations form (1.2). The QML estimator ^ * ' is defined by Eq. (2.15), applied to 
the state space model (3.13), that is 

= argmin^<=0 S^''\'9,y^'^'''>), (3. 14a) 

^'•'(,^,/'<''») [rflog2;r + logdety^'' y^'--'«^^^^ (3.14b) 

«=i 

where s^'' are the pseudo-innovations of the observed process F^''' - F^'', which are computed from the 
sample j;^'*''' - (F*/'', . . . , F^'') via the recursion 

Z^,„ = (e^^" - Z^,„_, + 4'"F^'L\, e^;;, = F^ - QZ^,„, n 6 N. 

The initial value X^ i may be chosen in the same ways as in the discrete-time case. The steady-state Kalman 
gain matrices /iT^''' and pseudo-covariances V^''' are computed as functions of the unique positive definite 
solution f2^' to the discrete-time algebraic Riccati equation 

Q^"' = ^^o^^^yi'^ + - [^^^"d^,] [cd^'clY [^'"'^Vcl]' ' 

namely 

In order to obtain the asymptotic normality of the QML estimator for multivariate CARMA processes, it is 
therefore only necessary to make sure that Assumptions Dl to DIO hold for the model (3.13). The discus- 
sion of identifiability in the previous section allows us to specify accessible conditions on the parametriz- 
ation of the continuous-time model under which the QML estimator is strongly consistent. In addition to 
the identifiability assumptions C3 to C5, we impose the following conditions. 

Assumption C6. The parameter space is a compact subset ofW. 

Assumption CI. The functions i? i— > A§, § i— > B§, § i— > Cg, and& are continuous. Moreover, for 

each #6 0, the matrix has full rank. 

Lemma 3.14. Assumptions CI to C3, C6 and C7 imply that the family ^e'^"'', C^, A^^'', O)^^^ of discrete- 
time state space models satisfies Assumptions Dl to D4. 
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Proof. Assumption Dl is clear. Assumption D2 follows from the observation that the functions A i-» 
and (A, B, E) i-> Jj^' e'*"B2B^e'*'^"dM are continuous. By Assumptions C2, C6 and C7, and the fact that the 
eigenvalues of a matrix are continuous functions of its entries, it follows that there exists a positive real 
number e such that, for each § e ®, the eigenvalues of have real parts less than or equal to -e. The 
observation that the eigenvalues of e"^ are given by the exponentials of the eigenvalues of A thus shows that 
Assumption D3, i) holds with p :- e < 1. Assumption CI that the matrices are non-singular and the 
minimality assumption C3 imply by Corollary 3.9 that the noise covariance matrices = EA^^'J^A^^'Jj^ 
are non-singular, and thus Assumption D3, ii) holds. Further, by Proposition 2.1, the matrices Q.^ are non- 
singular, and so are, because the matrices are assumed to be of full rank, the matrices this means 
that Assumption D3, iii) is satisfied. Assumption D4 is a consequence of Proposition 3.6, which states that 
the noise sequences A^^ are i. i. d. and in particular ergodic; their second moments are finite because of 
Assumption CI. □ 

In order to be able to show that the QML estimator # * ' is asymptotically normally distributed, we 
impose the following conditions in addition to the ones described so far 

Assumption C8. The true parameter value &q is an element of the interior of&. 

Assumption C9. The functions if A^, § i— > B^, § i— > Cg, and & 2^ are three times continuously 
dijferentiable. 

Assumption CIO. There exists a positive number 6 such that E ||L^„(1)||^^'' < oo. 

Lemma 3.15. Assumptions C8 to CIO imply that Assumptions D6 to D8 hold for the model (3.13). 

Proof. Assumption D6 is clear. Assumption D7 follows from the fact that the functions A i-> e'* and 
(A, B, S) e'^" B'LB^e^^^du are not only continuous, but infinitely often differentiable. For Assump- 

tion D8 we need to show that the random variables :- N^g^\ have bounded (4 -i- 6)th absolute moments. 
It follows from Rajput and Rosihski (1989, Theorem 2.7) that is infinitely divisible with characteristic 
triplet (7, S, v), and that 



The first factor on the right side is finite by Assumptions C6 and C9, the second by Assumption CIO and 
the equivalence of finiteness of the ath absolute moment of an infinitely divisible distribution and finiteness 
of the ath absolute moments of the corresponding Levy measure restricted to the exterior of the unit ball 
(Sato, 1999, Corollary 25.8). The same corollary shows that E HA^H'*^* < 00 and thus Assumption D8. □ 

Our final assumption is the analogue of Assumption DIO. It will ensure that the Fisher information 

matrix of the QML estimator ^ * ' is non-singular by imposing a non-degeneracy condition on the para- 
metrization of the model. 




Assumption Cll. There exists a positive index jo such that the [jo + 2)d~ 



X r matrix 




has rank r. 




QML estimation for strongly mixing state space models and MCARMA processes 



29 



of length L from the discretely observed output process corresponding to the parameter value i^o e ©. 
Under Assumptions CI to C7 the QML estimator ^ — argmin^gQ ^(t?,_y^'*''') is strongly consistent, i. e. 

^L,wj^^^_ (3.15) 

L— >oo 

If, moreover, Assumptions C8 to Cll hold, then & is asymptotically normally distributed, i. e. 

VI - #o) ^(0, H), (3.16) 
where the asymptotic covariance matrix S = J^^IJ^^ is given by 

I = lim Var(Vi,^(i?o,/)), J = Hm L^'vf,^ (i?o,/) . (3.17) 

Proof. Strong consistency of ^ ^ * is a consequence of Theorem 2.4 if we can show that the parametric 
family [t^o'\Ci,,N a,V>)^^^ of discrete-time state space models satisfies Assumptions Dl to D5. The first 
four of these are shown to hold in Lemma 3. 14. For the last one, we observe that, by Lemma 2.3, Assump- 
tion D5 is equivalent to the family of state space models (3. 13) being identifiable from the spectral density. 
Under Assumptions C3 to C5 this is guaranteed by Theorem 3.13. 

In order to prove Eq. (3.16), we shall apply Theorem 2.5 and therefore need to verify Assumptions D6 
to DIO for the state space models (e'**'', Cg, Ng, O)^^^. The first three hold by Lemma 3.15, the last one as a 
reformulation of Assumption CI 1 . Assumption D9, that the strong mixing coefficients a of a sampled mul- 
tivariate CARMA process satisfy 2m[Q'('«)]*^*^^'*-' < oo, follows from Assumption CI and Marquardt and 
Stelzer (2007, Proposition 3.34), where it was shown that MCARMA processes with a finite logarithmic 
moment are exponentially strongly mixing. □ 

4. Practical applicability 

In this section we complement the theoretical results from Sections 2 and 3 by commenting on their applic- 
ability in practical situations. Canonical parametrizations are a classical subject of research about discrete- 
time dynamical systems, and most of the results apply also to the continuous-time case; without going into 
detail we present the basic notions and results about these parametrizations. The assertions of Theorem 3.16 
are confirmed by a simulation study for a bivariate non-Gaussian CARMA process. Finally, we estimate 
the parameters of a CARMA model for a bivariate time series from economics using our QML approach. 



4.1. Canonical parametrizations 

We present parametrizations of multivariate CARMA processes that satisfy the identifiability conditions C3 
and C4, as well as the smoothness conditions C7 and C9; if, in addition, the parameter space © is restricted 
so that Assumptions C2, C5, C6 and C8 hold, and the driving Levy process satisfies Assumption CI, the 
canonically parametrized MCARMA model can be estimated consistently. In order for this estimate to be 
asymptotically normally distributed, one must additionally impose Assumption CIO on the Levy process 
and check that Assumption Cll holds - a condition which we are unable to verify analytically for the 
general model; for explicit parametrizations, however, it can be checked numerically with moderate com- 
putational effort. The parametrizations are well-known from the discrete-time setting; detailed descriptions 
with proofs can be found in Hannan and Deistler (1988) or, from a slightly different perspective, in the 
control theory literature (Gevers, 1986, and references therein). We begin with a canonical decomposition 
for rational matrix functions. 



Theorem 4.1 (Bernstein (2005, Theorem 4.7.5)). Let H e Md,m(^{z}) be a rational matrix function of 
rank r. There exist matrices S\ 6 M,;(R[z]) and ^2 6 M„,(R[z]) with constant determinant, such that 
H = SiMS2, where 

diag{e,7i/r,);'^i 0^,,,,- 



M = 



e Md,„,(R{z)), 



(4.1) 
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and e\, . . .er, if/\, . . . ,if/r 6 K.[z] are monic polynomials uniquely determined by H satisfying the following 
conditions: for each i — I, . . . ,r, the polynomials e,- and if/i have no common roots, and for each i — 
1, . . . , r - 1, the polynomial e/ divides the polynomial e/+i fi/r/j. The triple (5 1, M, ^2) is called the 

Smith-McMillan decomposition ofH. 

The degrees v, of the denominator polynomials iffj in the Smith-McMillan decomposition of a rational 
matrix function H are called the Kronecker indices of H, and they define the vector v = (vi , . . . , v^) e N'', 
where we set v/c - for k - r + 1, . . . ,d. They satisfy the important relation Yjf^i - 6m(H), where 6m(H) 
denotes the McMillan degree of H, i. e. the smallest possible dimension of an algebraic realization of H, 
see Definition 3.4. For 1 < /, j < d, we also define the integers v,j = min{v, + I{i>j), Vj}, and if the Kronecker 
indices of the transfer function of an MCARMA process Y are v, we call Y an MCARMAy process. 

Theorem 4.2 (Echelon state space reaUzation, Guidorzi (1975, Section 3)). For natural numbers d and 
m, let H € M^_,„(R{z)) be a rational matrix function with Kronecker indices v = (vi , . . . , vj). Then a unique 
minimal algebraic realization (A, B, C) ofH of dimension N — 6m{H) is given by the following structure. 

(i) The matrix A — (A,j),j=i ..^^ e Mai(R) is a block matrix with blocks Aij e Mv,_v,(R) given by 





V aij,\ 



an 



^ 


) 




I 



) 



(4.2a) 



(ii) B — (bij) e M/v_„,(R.) unrestricted, 
(Hi) if Vi > 0, i — I, . . . ,d, then 



C 



1 ... 

0(i/-l),vi 



... 

1 ... 

0(J-2),V2 



1 ... ; 



(4.2b) 



If V, = 0, the elements of the /th row of C are also freely varying, but we concentrate here on the 
case where all Kronecker indices v, are positive. To compute v as well as the coefficients ff/y^ and bij 
for a given rational matrix function H, several numerically stable and efficient algorithms are available 
in the literature (see, e. g., Rozsa and Sinha, 1975, and ffie references therein). The orthogonal invariance 
inherent in spectral factorization (see Theorem 3.5) implies that this parametrization alone does not ensure 
identifiability. One remedy is to restrict the parametrization to transfer functions H satisfying H(Q) = Hq, 
for a non-singular matrix //q. To see how one must constrain ffie parameters aij^k, bjj in order to ensure this 
normalization, we work in terms of left matrix fraction descriptions. 

Theorem 4.3 (Echelon MCARMA realization, Guidorzi (1975, Section 3)). For positive integers d and 
m, let H e Md^,„(R.{z}) be a rational matrix function with Kronecker indices v = (vi, . . . , v^). Assume that 
(A, B, C) is a realization ofH, parametrized as in Eqs. (4.2). Then a unique left matrix fraction description 
P'^Q ofH is given by P(z) = [/"//z)], Q(z) = [qijiz)\, where 

Pijiz) = Sijz"' - aij.kz''-\ quiz) = ^v,+...+v,-,+*,/"', (4.3) 

k=] k=\ 

and the coefficient kij is the (/, j)th entry of the matrix K — TB, where the matrix T — {Tij)ij;^\^,^j € Mf^{W) 
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(1,1) 



n{v) 



&2 
&4 



B 

J?3 )?4 



1 
1 



(1,2) 



10 





J?3 !?4 





1 

I?5 ) 



&2 
&4 + 



1 
1 



(2,1) 



11 



1 

J?l J?2 J?3 
!?4 !?5 !?6 ; 



1?« 



1 
1 



(2,2) 



15 



10 

!?1 &2 &3 !?4 

1 

&S &6 &S 



'?9 



1?1 + 1?4!?11 + 1?2!?9 1?3 + 1?2!?10 + l?4l?12 



10 
10 



Table 1. Canonical state space realizations (A, B, C) of normalized {H{0) = -I2) rational transfer functions in M2(R{z|) with 
different Kronecker indices v; the number of parameters, n(v), includes three parameters for a covariance matrix S^. 



is a block matrix with blocks Tij e Mv,,v^(K.) given by 

. -a./v,, ... ^ 



Tij = 



j 



( 




1 

1 



1 ^ 

1 








(4.4) 



The orders p,q of the polynomials P, Q satisfy p - max{ vi , . . . , v^/) and q < p-l - Using this parametriz- 
ation, there are different ways to impose the normalization H{Q) = Hq e M;;_„,(R). One first observes that the 
special structure of the polynomials P and Q implies that H{0) - P(0)"'2(0) = -(<3'i7,i)7,'('<'vi+...+v,-i+ij)!7- 
The canonical state space parametrization (A, B, C) given by Eqs. (4.2) therefore satisfies H{Q) - -CA^^B = 
Ho if one makes the coefficients aijj functionally dependent on the free parameters aip,,, m - 1, . . . Vy and 
bij by setting ajj \ - -[('(■vi+...+Vi-_i+i,/)yt;^^o 'l/j, where Kij are the entries of the matrix K appearing in The- 
orem 4.3 and //^ ' is a right inverse of Hq. Another possibility, which has the advantage of preserving the 
multi-companion structure of the matrix A, is to keep the ajj^i as free parameters, and to restrict some of 
the entries of the matrix B instead. Since | det K\ - \ and the matrix T is thus invertible, the coefficients 
bjj can be written as B = T^^K. Replacing the {vi + . . . + v,_i + 1, y)th entry of K by the (/, 7)th entry of 
the matrix —{aki,\)kiHQ makes some of the bij functionally dependent on the entries of the matrix A, and 
results in a state space representation with prescribed Kronecker indices and satisfying //(O) = Hq. This 
latter method has also the advantage that it does not require the matrix Hq to possess a right inverse. In 
the special case that d - m and //q = -1,;, it suffices to set /<'v,+...+v,_i+i,; = ffi/i- Examples of normalized 
low-order canonical parametrizations are given in Tables 1 and 2. 



4.2. A simulation study 

We present a simulation study for a bivariate C ARMA process with Kronecker indices ( 1 , 2), i. e. C ARMA 
indices {p, q) - (2, 1). As the driving Levy process we chose a zero-mean normal-inverse Gaussian (NIG) 
process (L(f))rER. Such processes have been found to be useful in the modelling of stock returns and 
stochastic volatility, as well as turbulence data (see, e. g., Barndorff-Nielsen, 1997; Rydberg, 1997). The 
distribution of the increments L{t) — L(t - 1) of a bivariate normal-inverse Gaussian Levy process is char- 
acterized by the density 

r , o . 6exp{6K) exp{(fix}) l+ag(x) ^ 

fmGix;ti,a,fi,6,A)^ — — --3—, xeR, 

2n exp{ag{x)) gixy 
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V 


n{v) 


P{z) 


Qiz) 


ip, q) 


(1,1) 


1 


\ -&3 Z-X% ] 


1 &i &2 \ 


(1,0) 


(1,2) 


10 




1 Z-&i -&2 ] 
-1?3 Z^--&4Z-&5 1 






( !?1 &2 \ 
, &6Z + &3 &7Z+&5 1 




(2,1) 


(2,1) 


11 




' Z^-&lZ--»2 -^3 ] 
, -&4Z-&S Z-&6 j 






' &1Z + &2 &SZ+&3 ^ 
, &S &6 1 




(2,1) 


(2,2) 


15 


I z'-&iZ-&2 -&3Z-&4 \ 
\ -&5Z -&(, Z~ - &iZ j 


/ 
\ 


&gZ + &2 &10Z + 1?4 
1?llZ + 1?6 l?12Z+»?8 


) 


(2,1) 



Table 2. Canonical MCARMA realizations (P, Q) with order (p, q) of normalized (//(O) = -I2) rational transfer functions in 
M2(9\z]) with different Kronecker indices v; the number of parameters, n(v), includes three parameters for a covariance matrix Z^. 



parameter 


sample mean 


bias 


sample std. dev. 


mean est. std. dev. 


!?1 


-1.0001 


0.0001 


0.0354 


0.0381 


&2 


-2.0078 


0.0078 


0.0479 


0.0539 


&i 


1.0051 


-0.0051 


0.1276 


0.1321 


&4 


-2.0068 


0.0068 


0.1009 


0.1202 


&5 


-2.9988 


-0.0012 


0.1587 


0.1820 


1?6 


1.0255 


-0.0255 


0.1285 


0.1382 


1?7 


2.0023 


-0.0023 


0.0987 


0.1061 


^^8 


0.4723 


-0.0028 


0.0457 


0.0517 


>?9 


-0.1654 


0.0032 


0.0306 


0.0346 


I?10 


0.3732 


0.0024 


0.0286 


0.0378 



Table 3. QML estimates for the parameters of a bivariate NIG-driven CARMAi 2 process observed at integer times over the time 
horizon [0, 2000]. The second column reports the empirical mean of the estimators as obtained from 350 independent paths; the third 
and fourth columns contain the resulting bias and the sample standard deviation of the estimators, respectively, while the last column 
reports the average of the expected standard deviations of the estimators as obtained from the asymptotic normality result 

Theorem 3.16. 



where 

g(x) = ^/^2T^JC^^^A^A(^r^, ^a- - iP, Aj8) > 0, 

and /i 6 is a location parameter, a > is a shape parameter, )S € is a symmetry parameter, 5 > is 
a scale parameter and A e Mj(R), det A = 1, determines the dependence between the two components of 
(L(f))/eR- For our simulation study we chose parameters 

5=1, ff=3, )8=(i,if, A=(_^//^2 ^""i^^^'^^^' ^^-^^ 

, • ■ , J -u ■ ■ t. J ■ i 0.4751 -0.1622 \ ^ 

resultmg m a skewed distribution with mean zero and covariance Z ^i^oo n-mno • A sample 

I — U.lo22 U.37(Jo / 

of 350 independent replicates of the bivariate CARMAi_2 process (F(f))/eR driven by a normal-inverse 
Gaussian Levy process (L(f))feR with parameters given in Eq. (4.5) were simulated on the equidistant time 
grid 0,0.01, . . .,2000 by applying an Euler scheme to the stochastic differential equation (3.5) making 
use of the canonical parametrization given in Table 1. For the simulation, the initial value X(0) = O3 
and parameters d-\--i - (-1,-2, 1,-2,-3, 1,2) was used. Each realization was sampled at integer times 
{h - \), and QML estimates of j?i , . . . , 1^7 as well as (??8, d-<), §10) :- vech 2^ were computed by numerical 
maximization of the quasi log-likelihood function using a differential evolution optimization routine (Price, 
Storn and Lampinen, 2005) in conjunction with a subspace trust-region method In Table 3 the sample means 
and sampled standard deviations of the estimates are reported. Moreover, the standard deviations were 
estimated using the square roots of the diagonal entries of the asymptotic covariance matrix (2.21) with 
s{L) - [L/ log Lj'^^, and the estimates are also displayed in Table 3. One sees that the bias, the difference 
between the sample mean and the true parameter value, is very small in accordance with the asymptotic 
consistency of the estimator. Moreover, the estimated standard deviation is always slightly larger than the 
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sample standard deviation, yet close enough to provide a useful approximation for, e. g., the construction 
of confidence regions. In order not to underestimate the uncertainty in the estimate, such a conservative 
approximation to the true standard deviations is desirable in practice. Overall, the estimation procedure 
performs very well in the simulation study. 
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